英文:
How to combine two dataframes conditionally and the write the output as a text file?
问题
with open('readme.txt', 'w') as f:
for i in range(len(df1)):
row_df1 = df1.loc[[i]]
f.write(row_df1.to_string(index=False, header=False) + '\n')
condition = (df2['D4'] == row_df1['P4'].values[0]) & (df2['D5'] == row_df1['P5'].values[0]) & (df2['D6'] == row_df1['P6'].values[0])
filtered_df2 = df2[condition]
if not filtered_df2.empty:
f.write(filtered_df2.to_string(index=False, header=False) + '\n')
else:
f.write('No event\n')
英文:
My dataset consists of two data frames. For example, df1 and df2 for simplicity (actual dataset is large)
df1 = pd.DataFrame({'P1': [2019, 2019, 2018, 2019, 2019, 2019],
'P2': [1, 2, 8, 3, 4, 5],
'P3': [1, 1, 8, 1, 1, 1],
'P4': [6, 2.3, 8.8, 4.6, 5.3, 7],
'P5': [11.4, 18, 18.8, 25, 12, 27.4],
'P6': [32.44, 31.56, 18, 33.01, 31.24, 31.95]
})
df2 = pd.DataFrame({'D1': [2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2018, 2018, 2018, 2018, 2018],
'D2': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5],
'D3': [5, 6, 3, 2, 1, 10, 11, 12, 7, 6, 5, 4, 1, 2, 6],
'D4': [6, 2.3, 4.6, 5.3, 7, 6, 2.3, 4.6, 5.3, 7,6, 2.3, 4.6, 5.3, 7],
'D5': [11.4, 18, 25, 12, 27.4, 11.4, 18, 25, 12, 27.4, 11.4, 18, 25, 12, 27.4],
'D6': [32.44, 31.56, 33.01, 31.24, 31.95, 32.44, 31.56, 33.01, 31.24, 31.95, 32.44, 31.56, 33.01, 31.24, 31.95],
'ST': ['AB', 'BC', 'CD', 'EF', 'GH', 'IJ', 'KL', 'ZY', 'ST', 'QD', 'YT', 'RT', 'EW', 'SD', 'FF']
})
I require combining them iteratively by picking the first row (then second and so on) of df1 (write on a text file) and then scan over the df2 and select rows if
df3 =df1['p4'] == df2['D4'] and df1['p5'] == df2['D5'] and df1['p6'] == df2['D6']
(write df3 on the same text file)
This process repeat iteratively till length of df1 and then generate a text file in such a format.
In most of the cases, condition is not fulfilled then write a line with 'No event'
Expected output
0 2019 1 1 6.0 11.4 32.44
1 2019 1 5 6.0 11.4 32.44 AB
2 2019 6 10 6.0 11.4 32.44 IJ
3 2018 1 5 6.0 11.4 32.44 YT
4 2019 2 1 2.3 18.0 31.56
5 2019 2 6 2.3 18.0 31.56 BC
6 2019 7 11 2.3 18.0 31.56 KL
7 2018 2 4 2.3 18.0 31.56 RT
8 2018 8 8 8.8 18.8 18.00
No event
Here is what I did so far:
df1 = pd.DataFrame({'P1': [2019, 2019, 2018, 2019, 2019, 2019],
'P2': [1, 2, 8, 3, 4, 5],
'P3': [1, 1, 8, 1, 1, 1],
'P4': [6, 2.3, 8.8, 4.6, 5.3, 7],
'P5': [11.4, 18, 18.8, 25, 12, 27.4],
'P6': [32.44, 31.56, 18, 33.01, 31.24, 31.95]
})
df2 = pd.DataFrame({'D1': [2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2018, 2018, 2018, 2018, 2018],
'D2': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5],
'D3': [5, 6, 3, 2, 1, 10, 11, 12, 7, 6, 5, 4, 1, 2, 6],
'D4': [6, 2.3, 4.6, 5.3, 7, 6, 2.3, 4.6, 5.3, 7,6, 2.3, 4.6, 5.3, 7],
'D5': [11.4, 18, 25, 12, 27.4, 11.4, 18, 25, 12, 27.4, 11.4, 18, 25, 12, 27.4],
'D6': [32.44, 31.56, 33.01, 31.24, 31.95, 32.44, 31.56, 33.01, 31.24, 31.95, 32.44, 31.56, 33.01, 31.24, 31.95],
'ST': ['AB', 'BC', 'CD', 'EF', 'GH', 'IJ', 'KL', 'ZY', 'ST', 'QD', 'YT', 'RT', 'EW', 'SD', 'FF']
})
L=len(df1)
with open('readme.txt', 'w') as f:
for i in range(L):
a=df1.loc[[i]]
f.write(a, /n)
C1=a['P4']
C2=a['P5']
C3=a['P6']
if df3 = df2[(df2['D4'] == C1) and df2['D5']==C2 and df2['D6']==C3]
f.write(df3, /n)
else
f.write(n/,'no event'/n)
print(df1)
My script is still not producing the required result. May someone suggest how I can improve or update the script.
Thank you!
答案1
得分: 0
您提供的代码中存在一些问题:
if
和else
语句的缩进不正确,并且缺少:
- 写入文件时需要传递一个字符串(而不是 pandas 对象)
- 如果希望写入多行(当
df2
的多列与df1
中选定的列匹配时),需要对df2
进行循环处理 f.write()
只接受一个参数,如果要添加换行符 (\n
而不是/n
),您需要在单独的调用中执行此操作- 您提供的示例输出包括未在
if
语句中提及的元素。尽管我不确定这些是否是所需元素,但我已经添加了一些
以下是对您的代码应用的上述更改:
L1 = len(df1)
L2 = len(df2)
with open('readme.txt', 'w') as f:
for i in range(L1):
a = df1.loc[i]
f.write(str([element for element in a]))
f.write('\n')
found_line = False
for j in range(L2):
b = df2.loc[j]
if (b['D4'] == a['P4']) and (b['D5'] == a['P5']) and (b['D6'] == a['P6']):
elements = [b['D1'], b['D2'], b['D3'], a['P4'], a['P5'], a['P6'], b['ST']]
f.write(str(elements))
f.write('\n')
found_line = True
if not found_line:
f.write('no event \n')
break
这将生成以下 .txt 文件:
[2019.0, 1.0, 1.0, 6.0, 11.4, 32.44]
[2019, 1, 5, 6.0, 11.4, 32.44, 'AB']
[2019, 6, 10, 6.0, 11.4, 32.44, 'IJ']
[2018, 1, 5, 6.0, 11.4, 32.44, 'YT']
[2019.0, 2.0, 1.0, 2.3, 18.0, 31.56]
[2019, 2, 6, 2.3, 18.0, 31.56, 'BC']
[2019, 7, 11, 2.3, 18.0, 31.56, 'KL']
[2018, 2, 4, 2.3, 18.0, 31.56, 'RT']
[2018.0, 8.0, 8.0, 8.8, 18.8, 18.0]
no event
英文:
Hi noticed a couple of issues in the code provided:
- The
if
andelse
statements are not indented properly and missing a:
- When writing to a file you need to pass a string (not a pandas object)
- As you wish to write multiple lines (should multiple columns of
df2
match with the selected column indf1
) you will need to loop overdf2
as well f.write()
only takes one argument if you want to add a linebreak ('\n'
instead of'/n'
) you will have to do so in a separate call- The example output you have provided includes elements not mentioned in the if statement, I have added some although I am not sure if these are the desired elements
Here are the mentioned changes applied to your code:
L1 = len(df1)
L2 = len(df2)
with open('readme.txt', 'w') as f:
for i in range(L1):
a = df1.loc[i]
f.write(str([element for element in a]))
f.write('\n')
found_line = False
for j in range(L2):
b = df2.loc[j]
if (b['D4'] == a['P4']) and (b['D5'] == a['P5']) and (b['D6'] == a['P6']):
elements = [b['D1'], b['D2'], b['D3'], a['P4'], a['P5'], a['P6'], b['ST']]
f.write(str(elements))
f.write('\n')
found_line = True
if not found_line:
f.write('no event \n')
break
This results in the following .txt file
[2019.0, 1.0, 1.0, 6.0, 11.4, 32.44]
[2019, 1, 5, 6.0, 11.4, 32.44, 'AB']
[2019, 6, 10, 6.0, 11.4, 32.44, 'IJ']
[2018, 1, 5, 6.0, 11.4, 32.44, 'YT']
[2019.0, 2.0, 1.0, 2.3, 18.0, 31.56]
[2019, 2, 6, 2.3, 18.0, 31.56, 'BC']
[2019, 7, 11, 2.3, 18.0, 31.56, 'KL']
[2018, 2, 4, 2.3, 18.0, 31.56, 'RT']
[2018.0, 8.0, 8.0, 8.8, 18.8, 18.0]
no event
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论