2023年5月21日 12:37:41go评论64阅读模式

英文:

How to combine two dataframes conditionally and the write the output as a text file?

问题

with open('readme.txt', 'w') as f:
    for i in range(len(df1)):
        row_df1 = df1.loc[[i]]
        f.write(row_df1.to_string(index=False, header=False) + '\n')
        
        condition = (df2['D4'] == row_df1['P4'].values[0]) & (df2['D5'] == row_df1['P5'].values[0]) & (df2['D6'] == row_df1['P6'].values[0])
        filtered_df2 = df2[condition]
        
        if not filtered_df2.empty:
            f.write(filtered_df2.to_string(index=False, header=False) + '\n')
        else:
            f.write('No event\n')

英文:

My dataset consists of two data frames. For example, df1 and df2 for simplicity (actual dataset is large)

df1 = pd.DataFrame({&#39;P1&#39;: [2019, 2019, 2018, 2019, 2019, 2019],
                    &#39;P2&#39;: [1, 2, 8, 3, 4, 5],
                    &#39;P3&#39;: [1, 1, 8, 1, 1, 1],
                    &#39;P4&#39;: [6, 2.3, 8.8, 4.6, 5.3, 7],
                    &#39;P5&#39;: [11.4, 18, 18.8, 25, 12, 27.4],
                    &#39;P6&#39;: [32.44, 31.56, 18, 33.01, 31.24, 31.95]
                   })


df2 = pd.DataFrame({&#39;D1&#39;: [2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2018, 2018, 2018, 2018, 2018],
                    &#39;D2&#39;: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5],
                    &#39;D3&#39;: [5, 6, 3, 2, 1, 10, 11, 12, 7, 6, 5, 4, 1, 2, 6],
                    &#39;D4&#39;: [6, 2.3, 4.6, 5.3, 7, 6, 2.3, 4.6, 5.3, 7,6, 2.3, 4.6, 5.3, 7],
                    &#39;D5&#39;: [11.4, 18, 25, 12, 27.4, 11.4, 18, 25, 12, 27.4, 11.4, 18, 25, 12, 27.4],
                    &#39;D6&#39;: [32.44, 31.56, 33.01, 31.24, 31.95, 32.44, 31.56, 33.01, 31.24, 31.95, 32.44, 31.56, 33.01, 31.24, 31.95],
                    &#39;ST&#39;: [&#39;AB&#39;, &#39;BC&#39;, &#39;CD&#39;, &#39;EF&#39;, &#39;GH&#39;, &#39;IJ&#39;, &#39;KL&#39;, &#39;ZY&#39;, &#39;ST&#39;, &#39;QD&#39;, &#39;YT&#39;, &#39;RT&#39;, &#39;EW&#39;, &#39;SD&#39;, &#39;FF&#39;]
                   })

I require combining them iteratively by picking the first row (then second and so on) of df1 (write on a text file) and then scan over the df2 and select rows if

df3 =df1[&#39;p4&#39;] == df2[&#39;D4&#39;] and df1[&#39;p5&#39;] == df2[&#39;D5&#39;] and df1[&#39;p6&#39;] == df2[&#39;D6&#39;]

(write df3 on the same text file)

This process repeat iteratively till length of df1 and then generate a text file in such a format.
In most of the cases, condition is not fulfilled then write a line with 'No event'

Expected output

0  2019   1   1  6.0  11.4  32.44
1 2019   1   5  6.0  11.4  32.44  AB
2 2019   6  10  6.0  11.4  32.44  IJ
3 2018   1   5  6.0  11.4  32.44  YT
4 2019   2   1  2.3  18.0  31.56
5 2019   2   6  2.3  18.0  31.56  BC
6 2019   7  11  2.3  18.0  31.56  KL
7 2018   2   4  2.3  18.0  31.56  RT
8 2018   8   8  8.8  18.8  18.00
No event

Here is what I did so far:

df1 = pd.DataFrame({&#39;P1&#39;: [2019, 2019, 2018, 2019, 2019, 2019],
                    &#39;P2&#39;: [1, 2, 8, 3, 4, 5],
                    &#39;P3&#39;: [1, 1, 8, 1, 1, 1],
                    &#39;P4&#39;: [6, 2.3, 8.8, 4.6, 5.3, 7],
                    &#39;P5&#39;: [11.4, 18, 18.8, 25, 12, 27.4],
                    &#39;P6&#39;: [32.44, 31.56, 18, 33.01, 31.24, 31.95]
                   })


df2 = pd.DataFrame({&#39;D1&#39;: [2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2018, 2018, 2018, 2018, 2018],
                    &#39;D2&#39;: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5],
                    &#39;D3&#39;: [5, 6, 3, 2, 1, 10, 11, 12, 7, 6, 5, 4, 1, 2, 6],
                    &#39;D4&#39;: [6, 2.3, 4.6, 5.3, 7, 6, 2.3, 4.6, 5.3, 7,6, 2.3, 4.6, 5.3, 7],
                    &#39;D5&#39;: [11.4, 18, 25, 12, 27.4, 11.4, 18, 25, 12, 27.4, 11.4, 18, 25, 12, 27.4],
                    &#39;D6&#39;: [32.44, 31.56, 33.01, 31.24, 31.95, 32.44, 31.56, 33.01, 31.24, 31.95, 32.44, 31.56, 33.01, 31.24, 31.95],
                    &#39;ST&#39;: [&#39;AB&#39;, &#39;BC&#39;, &#39;CD&#39;, &#39;EF&#39;, &#39;GH&#39;, &#39;IJ&#39;, &#39;KL&#39;, &#39;ZY&#39;, &#39;ST&#39;, &#39;QD&#39;, &#39;YT&#39;, &#39;RT&#39;, &#39;EW&#39;, &#39;SD&#39;, &#39;FF&#39;]
                   })
L=len(df1)

with open(&#39;readme.txt&#39;, &#39;w&#39;) as f:
    
    for i in range(L):
        a=df1.loc[[i]]
        f.write(a, /n)
        C1=a[&#39;P4&#39;]
        C2=a[&#39;P5&#39;]
        C3=a[&#39;P6&#39;]
        if df3 = df2[(df2[&#39;D4&#39;] == C1) and df2[&#39;D5&#39;]==C2 and df2[&#39;D6&#39;]==C3]
        f.write(df3, /n)
        else
        f.write(n/,&#39;no event&#39;/n)

print(df1)

My script is still not producing the required result. May someone suggest how I can improve or update the script.

Thank you!

答案1

得分: 0

您提供的代码中存在一些问题：

if 和 else 语句的缩进不正确，并且缺少 :
写入文件时需要传递一个字符串（而不是 pandas 对象）
如果希望写入多行（当 df2 的多列与 df1 中选定的列匹配时），需要对 df2 进行循环处理
f.write() 只接受一个参数，如果要添加换行符 (\n 而不是 /n)，您需要在单独的调用中执行此操作
您提供的示例输出包括未在 if 语句中提及的元素。尽管我不确定这些是否是所需元素，但我已经添加了一些

以下是对您的代码应用的上述更改：

L1 = len(df1)
L2 = len(df2)

with open('readme.txt', 'w') as f:
    for i in range(L1):
        a = df1.loc[i]
        f.write(str([element for element in a]))
        f.write('\n')
        found_line = False

        for j in range(L2):
            b = df2.loc[j]
            if (b['D4'] == a['P4']) and (b['D5'] == a['P5']) and (b['D6'] == a['P6']):
                elements = [b['D1'], b['D2'], b['D3'], a['P4'], a['P5'], a['P6'], b['ST']]
                f.write(str(elements))
                f.write('\n')
                found_line = True

        if not found_line:
            f.write('no event \n')
            break

这将生成以下 .txt 文件：

[2019.0, 1.0, 1.0, 6.0, 11.4, 32.44]
[2019, 1, 5, 6.0, 11.4, 32.44, 'AB']
[2019, 6, 10, 6.0, 11.4, 32.44, 'IJ']
[2018, 1, 5, 6.0, 11.4, 32.44, 'YT']
[2019.0, 2.0, 1.0, 2.3, 18.0, 31.56]
[2019, 2, 6, 2.3, 18.0, 31.56, 'BC']
[2019, 7, 11, 2.3, 18.0, 31.56, 'KL']
[2018, 2, 4, 2.3, 18.0, 31.56, 'RT']
[2018.0, 8.0, 8.0, 8.8, 18.8, 18.0]
no event

英文:

Hi noticed a couple of issues in the code provided:

The if and else statements are not indented properly and missing a :
When writing to a file you need to pass a string (not a pandas object)
As you wish to write multiple lines (should multiple columns of df2 match with the selected column in df1) you will need to loop over df2 as well
f.write() only takes one argument if you want to add a linebreak ('\n' instead of '/n') you will have to do so in a separate call
The example output you have provided includes elements not mentioned in the if statement, I have added some although I am not sure if these are the desired elements

Here are the mentioned changes applied to your code:

L1 = len(df1)
L2 = len(df2)

with open(&#39;readme.txt&#39;, &#39;w&#39;) as f:   
    for i in range(L1):
        a = df1.loc[i]
        f.write(str([element for element in a]))
        f.write(&#39;\n&#39;)
        found_line = False
        
        for j in range(L2):
            b = df2.loc[j]
            if (b[&#39;D4&#39;] == a[&#39;P4&#39;]) and (b[&#39;D5&#39;] == a[&#39;P5&#39;]) and (b[&#39;D6&#39;] == a[&#39;P6&#39;]):
                elements = [b[&#39;D1&#39;], b[&#39;D2&#39;], b[&#39;D3&#39;],  a[&#39;P4&#39;], a[&#39;P5&#39;], a[&#39;P6&#39;], b[&#39;ST&#39;]]
                f.write(str(elements))
                f.write(&#39;\n&#39;)
                found_line = True
                
        if not found_line:
            f.write(&#39;no event \n&#39;)
            break

This results in the following .txt file

[2019.0, 1.0, 1.0, 6.0, 11.4, 32.44]
[2019, 1, 5, 6.0, 11.4, 32.44, &#39;AB&#39;]
[2019, 6, 10, 6.0, 11.4, 32.44, &#39;IJ&#39;]
[2018, 1, 5, 6.0, 11.4, 32.44, &#39;YT&#39;]
[2019.0, 2.0, 1.0, 2.3, 18.0, 31.56]
[2019, 2, 6, 2.3, 18.0, 31.56, &#39;BC&#39;]
[2019, 7, 11, 2.3, 18.0, 31.56, &#39;KL&#39;]
[2018, 2, 4, 2.3, 18.0, 31.56, &#39;RT&#39;]
[2018.0, 8.0, 8.0, 8.8, 18.8, 18.0]
no event

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将两个数据框有条件地合并，然后将输出写入文本文件。

问题

答案1

TypeError: ‘NoneType’ object has no attribute ‘getitem’

如何在Python单元测试中导入一个模块？

Django 未对密码进行哈希处理的自定义用户

合并在另一个数据框中匹配的值时未能产生所期望的结果

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论