将两个数据框有条件地合并,然后将输出写入文本文件。

huangapple go评论60阅读模式
英文:

How to combine two dataframes conditionally and the write the output as a text file?

问题

with open('readme.txt', 'w') as f:
    for i in range(len(df1)):
        row_df1 = df1.loc[[i]]
        f.write(row_df1.to_string(index=False, header=False) + '\n')
        
        condition = (df2['D4'] == row_df1['P4'].values[0]) & (df2['D5'] == row_df1['P5'].values[0]) & (df2['D6'] == row_df1['P6'].values[0])
        filtered_df2 = df2[condition]
        
        if not filtered_df2.empty:
            f.write(filtered_df2.to_string(index=False, header=False) + '\n')
        else:
            f.write('No event\n')
英文:

My dataset consists of two data frames. For example, df1 and df2 for simplicity (actual dataset is large)

df1 = pd.DataFrame({'P1': [2019, 2019, 2018, 2019, 2019, 2019],
                    'P2': [1, 2, 8, 3, 4, 5],
                    'P3': [1, 1, 8, 1, 1, 1],
                    'P4': [6, 2.3, 8.8, 4.6, 5.3, 7],
                    'P5': [11.4, 18, 18.8, 25, 12, 27.4],
                    'P6': [32.44, 31.56, 18, 33.01, 31.24, 31.95]
                   })


df2 = pd.DataFrame({'D1': [2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2018, 2018, 2018, 2018, 2018],
                    'D2': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5],
                    'D3': [5, 6, 3, 2, 1, 10, 11, 12, 7, 6, 5, 4, 1, 2, 6],
                    'D4': [6, 2.3, 4.6, 5.3, 7, 6, 2.3, 4.6, 5.3, 7,6, 2.3, 4.6, 5.3, 7],
                    'D5': [11.4, 18, 25, 12, 27.4, 11.4, 18, 25, 12, 27.4, 11.4, 18, 25, 12, 27.4],
                    'D6': [32.44, 31.56, 33.01, 31.24, 31.95, 32.44, 31.56, 33.01, 31.24, 31.95, 32.44, 31.56, 33.01, 31.24, 31.95],
                    'ST': ['AB', 'BC', 'CD', 'EF', 'GH', 'IJ', 'KL', 'ZY', 'ST', 'QD', 'YT', 'RT', 'EW', 'SD', 'FF']
                   })


I require combining them iteratively by picking the first row (then second and so on) of df1 (write on a text file) and then scan over the df2 and select rows if

df3 =df1['p4'] == df2['D4'] and df1['p5'] == df2['D5'] and df1['p6'] == df2['D6']

(write df3 on the same text file)

This process repeat iteratively till length of df1 and then generate a text file in such a format.
In most of the cases, condition is not fulfilled then write a line with 'No event'

Expected output

0  2019   1   1  6.0  11.4  32.44
1 2019   1   5  6.0  11.4  32.44  AB
2 2019   6  10  6.0  11.4  32.44  IJ
3 2018   1   5  6.0  11.4  32.44  YT
4 2019   2   1  2.3  18.0  31.56
5 2019   2   6  2.3  18.0  31.56  BC
6 2019   7  11  2.3  18.0  31.56  KL
7 2018   2   4  2.3  18.0  31.56  RT
8 2018   8   8  8.8  18.8  18.00
No event 

Here is what I did so far:

df1 = pd.DataFrame({'P1': [2019, 2019, 2018, 2019, 2019, 2019],
                    'P2': [1, 2, 8, 3, 4, 5],
                    'P3': [1, 1, 8, 1, 1, 1],
                    'P4': [6, 2.3, 8.8, 4.6, 5.3, 7],
                    'P5': [11.4, 18, 18.8, 25, 12, 27.4],
                    'P6': [32.44, 31.56, 18, 33.01, 31.24, 31.95]
                   })


df2 = pd.DataFrame({'D1': [2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2018, 2018, 2018, 2018, 2018],
                    'D2': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5],
                    'D3': [5, 6, 3, 2, 1, 10, 11, 12, 7, 6, 5, 4, 1, 2, 6],
                    'D4': [6, 2.3, 4.6, 5.3, 7, 6, 2.3, 4.6, 5.3, 7,6, 2.3, 4.6, 5.3, 7],
                    'D5': [11.4, 18, 25, 12, 27.4, 11.4, 18, 25, 12, 27.4, 11.4, 18, 25, 12, 27.4],
                    'D6': [32.44, 31.56, 33.01, 31.24, 31.95, 32.44, 31.56, 33.01, 31.24, 31.95, 32.44, 31.56, 33.01, 31.24, 31.95],
                    'ST': ['AB', 'BC', 'CD', 'EF', 'GH', 'IJ', 'KL', 'ZY', 'ST', 'QD', 'YT', 'RT', 'EW', 'SD', 'FF']
                   })
L=len(df1)

with open('readme.txt', 'w') as f:
    
    for i in range(L):
        a=df1.loc[[i]]
        f.write(a, /n)
        C1=a['P4']
        C2=a['P5']
        C3=a['P6']
        if df3 = df2[(df2['D4'] == C1) and df2['D5']==C2 and df2['D6']==C3]
        f.write(df3, /n)
        else
        f.write(n/,'no event'/n)

print(df1)        


My script is still not producing the required result. May someone suggest how I can improve or update the script.

Thank you!

答案1

得分: 0

您提供的代码中存在一些问题:

  1. ifelse 语句的缩进不正确,并且缺少 :
  2. 写入文件时需要传递一个字符串(而不是 pandas 对象)
  3. 如果希望写入多行(当 df2 的多列与 df1 中选定的列匹配时),需要对 df2 进行循环处理
  4. f.write() 只接受一个参数,如果要添加换行符 (\n 而不是 /n),您需要在单独的调用中执行此操作
  5. 您提供的示例输出包括未在 if 语句中提及的元素。尽管我不确定这些是否是所需元素,但我已经添加了一些

以下是对您的代码应用的上述更改:

L1 = len(df1)
L2 = len(df2)

with open('readme.txt', 'w') as f:
    for i in range(L1):
        a = df1.loc[i]
        f.write(str([element for element in a]))
        f.write('\n')
        found_line = False

        for j in range(L2):
            b = df2.loc[j]
            if (b['D4'] == a['P4']) and (b['D5'] == a['P5']) and (b['D6'] == a['P6']):
                elements = [b['D1'], b['D2'], b['D3'], a['P4'], a['P5'], a['P6'], b['ST']]
                f.write(str(elements))
                f.write('\n')
                found_line = True

        if not found_line:
            f.write('no event \n')
            break

这将生成以下 .txt 文件:

[2019.0, 1.0, 1.0, 6.0, 11.4, 32.44]
[2019, 1, 5, 6.0, 11.4, 32.44, 'AB']
[2019, 6, 10, 6.0, 11.4, 32.44, 'IJ']
[2018, 1, 5, 6.0, 11.4, 32.44, 'YT']
[2019.0, 2.0, 1.0, 2.3, 18.0, 31.56]
[2019, 2, 6, 2.3, 18.0, 31.56, 'BC']
[2019, 7, 11, 2.3, 18.0, 31.56, 'KL']
[2018, 2, 4, 2.3, 18.0, 31.56, 'RT']
[2018.0, 8.0, 8.0, 8.8, 18.8, 18.0]
no event
英文:

Hi noticed a couple of issues in the code provided:

  1. The if and else statements are not indented properly and missing a :
  2. When writing to a file you need to pass a string (not a pandas object)
  3. As you wish to write multiple lines (should multiple columns of df2 match with the selected column in df1) you will need to loop over df2 as well
  4. f.write() only takes one argument if you want to add a linebreak ('\n' instead of '/n') you will have to do so in a separate call
  5. The example output you have provided includes elements not mentioned in the if statement, I have added some although I am not sure if these are the desired elements

Here are the mentioned changes applied to your code:

L1 = len(df1)
L2 = len(df2)

with open('readme.txt', 'w') as f:   
    for i in range(L1):
        a = df1.loc[i]
        f.write(str([element for element in a]))
        f.write('\n')
        found_line = False
        
        for j in range(L2):
            b = df2.loc[j]
            if (b['D4'] == a['P4']) and (b['D5'] == a['P5']) and (b['D6'] == a['P6']):
                elements = [b['D1'], b['D2'], b['D3'],  a['P4'], a['P5'], a['P6'], b['ST']]
                f.write(str(elements))
                f.write('\n')
                found_line = True
                
        if not found_line:
            f.write('no event \n')
            break

This results in the following .txt file

[2019.0, 1.0, 1.0, 6.0, 11.4, 32.44]
[2019, 1, 5, 6.0, 11.4, 32.44, 'AB']
[2019, 6, 10, 6.0, 11.4, 32.44, 'IJ']
[2018, 1, 5, 6.0, 11.4, 32.44, 'YT']
[2019.0, 2.0, 1.0, 2.3, 18.0, 31.56]
[2019, 2, 6, 2.3, 18.0, 31.56, 'BC']
[2019, 7, 11, 2.3, 18.0, 31.56, 'KL']
[2018, 2, 4, 2.3, 18.0, 31.56, 'RT']
[2018.0, 8.0, 8.0, 8.8, 18.8, 18.0]
no event 

huangapple
  • 本文由 发表于 2023年5月21日 12:37:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76298314.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定