英文:
How to combine two dataframes conditionally and the write the output as a text file?
问题
with open('readme.txt', 'w') as f:
for i in range(len(df1)):
row_df1 = df1.loc[[i]]
f.write(row_df1.to_string(index=False, header=False) + '\n')
condition = (df2['D4'] == row_df1['P4'].values[0]) & (df2['D5'] == row_df1['P5'].values[0]) & (df2['D6'] == row_df1['P6'].values[0])
filtered_df2 = df2[condition]
if not filtered_df2.empty:
f.write(filtered_df2.to_string(index=False, header=False) + '\n')
else:
f.write('No event\n')
英文:
My dataset consists of two data frames. For example, df1 and df2 for simplicity (actual dataset is large)
df1 = pd.DataFrame({'P1': [2019, 2019, 2018, 2019, 2019, 2019],
'P2': [1, 2, 8, 3, 4, 5],
'P3': [1, 1, 8, 1, 1, 1],
'P4': [6, 2.3, 8.8, 4.6, 5.3, 7],
'P5': [11.4, 18, 18.8, 25, 12, 27.4],
'P6': [32.44, 31.56, 18, 33.01, 31.24, 31.95]
})
df2 = pd.DataFrame({'D1': [2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2018, 2018, 2018, 2018, 2018],
'D2': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5],
'D3': [5, 6, 3, 2, 1, 10, 11, 12, 7, 6, 5, 4, 1, 2, 6],
'D4': [6, 2.3, 4.6, 5.3, 7, 6, 2.3, 4.6, 5.3, 7,6, 2.3, 4.6, 5.3, 7],
'D5': [11.4, 18, 25, 12, 27.4, 11.4, 18, 25, 12, 27.4, 11.4, 18, 25, 12, 27.4],
'D6': [32.44, 31.56, 33.01, 31.24, 31.95, 32.44, 31.56, 33.01, 31.24, 31.95, 32.44, 31.56, 33.01, 31.24, 31.95],
'ST': ['AB', 'BC', 'CD', 'EF', 'GH', 'IJ', 'KL', 'ZY', 'ST', 'QD', 'YT', 'RT', 'EW', 'SD', 'FF']
})
I require combining them iteratively by picking the first row (then second and so on) of df1 (write on a text file) and then scan over the df2 and select rows if
df3 =df1['p4'] == df2['D4'] and df1['p5'] == df2['D5'] and df1['p6'] == df2['D6']
(write df3 on the same text file)
This process repeat iteratively till length of df1 and then generate a text file in such a format.
In most of the cases, condition is not fulfilled then write a line with 'No event'
Expected output
0 2019 1 1 6.0 11.4 32.44
1 2019 1 5 6.0 11.4 32.44 AB
2 2019 6 10 6.0 11.4 32.44 IJ
3 2018 1 5 6.0 11.4 32.44 YT
4 2019 2 1 2.3 18.0 31.56
5 2019 2 6 2.3 18.0 31.56 BC
6 2019 7 11 2.3 18.0 31.56 KL
7 2018 2 4 2.3 18.0 31.56 RT
8 2018 8 8 8.8 18.8 18.00
No event
Here is what I did so far:
df1 = pd.DataFrame({'P1': [2019, 2019, 2018, 2019, 2019, 2019],
'P2': [1, 2, 8, 3, 4, 5],
'P3': [1, 1, 8, 1, 1, 1],
'P4': [6, 2.3, 8.8, 4.6, 5.3, 7],
'P5': [11.4, 18, 18.8, 25, 12, 27.4],
'P6': [32.44, 31.56, 18, 33.01, 31.24, 31.95]
})
df2 = pd.DataFrame({'D1': [2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2018, 2018, 2018, 2018, 2018],
'D2': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5],
'D3': [5, 6, 3, 2, 1, 10, 11, 12, 7, 6, 5, 4, 1, 2, 6],
'D4': [6, 2.3, 4.6, 5.3, 7, 6, 2.3, 4.6, 5.3, 7,6, 2.3, 4.6, 5.3, 7],
'D5': [11.4, 18, 25, 12, 27.4, 11.4, 18, 25, 12, 27.4, 11.4, 18, 25, 12, 27.4],
'D6': [32.44, 31.56, 33.01, 31.24, 31.95, 32.44, 31.56, 33.01, 31.24, 31.95, 32.44, 31.56, 33.01, 31.24, 31.95],
'ST': ['AB', 'BC', 'CD', 'EF', 'GH', 'IJ', 'KL', 'ZY', 'ST', 'QD', 'YT', 'RT', 'EW', 'SD', 'FF']
})
L=len(df1)
with open('readme.txt', 'w') as f:
for i in range(L):
a=df1.loc[[i]]
f.write(a, /n)
C1=a['P4']
C2=a['P5']
C3=a['P6']
if df3 = df2[(df2['D4'] == C1) and df2['D5']==C2 and df2['D6']==C3]
f.write(df3, /n)
else
f.write(n/,'no event'/n)
print(df1)
My script is still not producing the required result. May someone suggest how I can improve or update the script.
Thank you!
答案1
得分: 0
您提供的代码中存在一些问题:
if和else语句的缩进不正确,并且缺少:- 写入文件时需要传递一个字符串(而不是 pandas 对象)
- 如果希望写入多行(当
df2的多列与df1中选定的列匹配时),需要对df2进行循环处理 f.write()只接受一个参数,如果要添加换行符 (\n而不是/n),您需要在单独的调用中执行此操作- 您提供的示例输出包括未在
if语句中提及的元素。尽管我不确定这些是否是所需元素,但我已经添加了一些
以下是对您的代码应用的上述更改:
L1 = len(df1)
L2 = len(df2)
with open('readme.txt', 'w') as f:
for i in range(L1):
a = df1.loc[i]
f.write(str([element for element in a]))
f.write('\n')
found_line = False
for j in range(L2):
b = df2.loc[j]
if (b['D4'] == a['P4']) and (b['D5'] == a['P5']) and (b['D6'] == a['P6']):
elements = [b['D1'], b['D2'], b['D3'], a['P4'], a['P5'], a['P6'], b['ST']]
f.write(str(elements))
f.write('\n')
found_line = True
if not found_line:
f.write('no event \n')
break
这将生成以下 .txt 文件:
[2019.0, 1.0, 1.0, 6.0, 11.4, 32.44]
[2019, 1, 5, 6.0, 11.4, 32.44, 'AB']
[2019, 6, 10, 6.0, 11.4, 32.44, 'IJ']
[2018, 1, 5, 6.0, 11.4, 32.44, 'YT']
[2019.0, 2.0, 1.0, 2.3, 18.0, 31.56]
[2019, 2, 6, 2.3, 18.0, 31.56, 'BC']
[2019, 7, 11, 2.3, 18.0, 31.56, 'KL']
[2018, 2, 4, 2.3, 18.0, 31.56, 'RT']
[2018.0, 8.0, 8.0, 8.8, 18.8, 18.0]
no event
英文:
Hi noticed a couple of issues in the code provided:
- The
ifandelsestatements are not indented properly and missing a: - When writing to a file you need to pass a string (not a pandas object)
- As you wish to write multiple lines (should multiple columns of
df2match with the selected column indf1) you will need to loop overdf2as well f.write()only takes one argument if you want to add a linebreak ('\n'instead of'/n') you will have to do so in a separate call- The example output you have provided includes elements not mentioned in the if statement, I have added some although I am not sure if these are the desired elements
Here are the mentioned changes applied to your code:
L1 = len(df1)
L2 = len(df2)
with open('readme.txt', 'w') as f:
for i in range(L1):
a = df1.loc[i]
f.write(str([element for element in a]))
f.write('\n')
found_line = False
for j in range(L2):
b = df2.loc[j]
if (b['D4'] == a['P4']) and (b['D5'] == a['P5']) and (b['D6'] == a['P6']):
elements = [b['D1'], b['D2'], b['D3'], a['P4'], a['P5'], a['P6'], b['ST']]
f.write(str(elements))
f.write('\n')
found_line = True
if not found_line:
f.write('no event \n')
break
This results in the following .txt file
[2019.0, 1.0, 1.0, 6.0, 11.4, 32.44]
[2019, 1, 5, 6.0, 11.4, 32.44, 'AB']
[2019, 6, 10, 6.0, 11.4, 32.44, 'IJ']
[2018, 1, 5, 6.0, 11.4, 32.44, 'YT']
[2019.0, 2.0, 1.0, 2.3, 18.0, 31.56]
[2019, 2, 6, 2.3, 18.0, 31.56, 'BC']
[2019, 7, 11, 2.3, 18.0, 31.56, 'KL']
[2018, 2, 4, 2.3, 18.0, 31.56, 'RT']
[2018.0, 8.0, 8.0, 8.8, 18.8, 18.0]
no event
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论