2023年5月25日 14:54:18go评论91阅读模式

英文:

How to join two pandas DataFrames on trailing part of path / filename

问题

你可以使用 Pandas 的 merge 方法来进行这种合并操作。以下是将 df2 合并到 df1 的 Pandas 合并语句：

df3 = df1.merge(df2, left_on='PATH', right_on='F_NAME', how='inner')

这会根据 'PATH' 列和 'F_NAME' 列的匹配进行内连接合并，产生与你描述的结果相似的 df3 数据帧。

英文:

I have two data Frames as follows.

df1 = pd.DataFrame({&#39;PATH&#39;:[r&#39;C:\FODLER\Test1.jpg&#39;,
                            r&#39;C:\A\FODLER\Test2.jpg&#39;,
                            r&#39;C:\A\FODLER\Test3.jpg&#39;,
                            r&#39;C:\A\FODLER\Test4.jpg&#39;],
                    &#39;VALUE&#39;:[45,23,45,2]})
df2 = pd.DataFrame({&#39;F_NAME&#39;: [r&#39;FODLER\Test1.jpg&#39;,
                               r&#39;FODLER\Test2.jpg&#39;,
                               r&#39;FODLER\Test6.jpg&#39;,
                               r&#39;FODLER\Test3.jpg&#39;,
                               r&#39;FODLER\Test4.jpg&#39;,
                               r&#39;FODLER\Test9.jpg&#39;],
                    &#39;VALUE_X&#39;: [&#39;12&#39;, &#39;25&#39;, &#39;97&#39;, &#39;33&#39;, &#39;123&#39;, &#39;0&#39;],
                    &#39;CORDS&#39;: [&#39;1&#39;, &#39;2&#39;, &#39;3&#39;, &#39;4&#39;, &#39;5&#39;, &#39;6&#39;]})

I want to join df2, where PATH.Contains(F_NAME) to df1 table.
so resulting data frame is as follows :

df3 = pd.DataFrame({&#39;PATH&#39;:[r&#39;C:\FODLER\Test1.jpg&#39;,
                            r&#39;C:\A\FODLER\Test2.jpg&#39;,
                            r&#39;C:\A\FODLER\Test3.jpg&#39;,
                            r&#39;C:\A\FODLER\Test4.jpg&#39;],
                    &#39;F_NAME&#39;: [r&#39;FODLER\Test1.jpg&#39;,
                               r&#39;FODLER\Test2.jpg&#39;,
                               r&#39;FODLER\Test3.jpg&#39;,
                               r&#39;FODLER\Test4.jpg&#39;],
                    &#39;VALUE_X&#39;: [&#39;12&#39;, &#39;25&#39;, &#39;33&#39;, &#39;123&#39;],
                    &#39;CORDS&#39;: [&#39;1&#39;, &#39;2&#39;, &#39;4&#39;, &#39;5&#39;],
                    &#39;VALUE&#39;:[45,23,45,2]})

How do I write the pandas merge statement to do this joining?

答案1

得分: 2

You can use a merge with a regex using str.extract to extract the end-of-line anchored part of the path:

import re
pattern = f"({'|'.join(df2['F_NAME'].apply(re.escape))})$"
df3 = df1.merge(df2, left_on=df1['PATH'].str.extract(pattern, expand=False),
                right_on='F_NAME', how='left')

Output:

                    PATH  VALUE            F_NAME VALUE_X CORDS
0    C:\FODLER\Test1.jpg     45  FODLER\Test1.jpg      12     1
1  C:\A\FODLER\Test2.jpg     23  FODLER\Test2.jpg      25     2
2  C:\A\FODLER\Test3.jpg     45  FODLER\Test3.jpg      33     4
3  C:\A\FODLER\Test4.jpg      2  FODLER\Test4.jpg     123     5

pattern:

(FODLER\\Test1\.jpg|FODLER\\Test2\.jpg|FODLER\\Test6\.jpg|FODLER\\Test3\.jpg|FODLER\\Test4\.jpg|FODLER\\Test9\.jpg)$

regex demo

Alternatively, if the PATH only has 2 components (folder\filename.ext), you can assign a column with the trailing part of the path before merging:

df3 = (df1
    .assign(F_NAME=df1['PATH'].str.extract(r'([^\\]+\\[^\\]+)$', expand=False))
    .merge(df2, how='left')
)

regex demo

英文:

You can use a merge with a regex using str.extract to extract the end-of-line anchored part of the path:

import re
pattern = f&quot;({&#39;|&#39;.join(df2[&#39;F_NAME&#39;].apply(re.escape))})$&quot;
df3 = df1.merge(df2, left_on=df1[&#39;PATH&#39;].str.extract(pattern, expand=False),
                right_on=&#39;F_NAME&#39;, how=&#39;left&#39;)

Output:

                    PATH  VALUE            F_NAME VALUE_X CORDS
0    C:\FODLER\Test1.jpg     45  FODLER\Test1.jpg      12     1
1  C:\A\FODLER\Test2.jpg     23  FODLER\Test2.jpg      25     2
2  C:\A\FODLER\Test3.jpg     45  FODLER\Test3.jpg      33     4
3  C:\A\FODLER\Test4.jpg      2  FODLER\Test4.jpg     123     5

pattern:

(FODLER\\Test1\.jpg|FODLER\\Test2\.jpg|FODLER\\Test6\.jpg|FODLER\\Test3\.jpg|FODLER\\Test4\.jpg|FODLER\\Test9\.jpg)$

regex demo

Alternatively, if the PATH only has 2 components (folder\filename.ext), you can assign a column with the trailing part of the path before merging:

df3 = (df1
    .assign(F_NAME=df1[&#39;PATH&#39;].str.extract(r&#39;([^\\]+\\[^\\]+)$&#39;, expand=False))
    .merge(df2, how=&#39;left&#39;)
)

regex demo

答案2

得分: 1

可以尝试这个：

df3 = df1[df1['PATH'].str.contains('|'.join(df2['F_NAME']))].merge(df2, left_on=df1['PATH'], right_on=df2['F_NAME'], how='left')
print(df3)

英文:

Can you try this one:

df3 = df1[df1[&#39;PATH&#39;].str.contains(&#39;|&#39;.join(df2[&#39;F_NAME&#39;]))].merge(df2, left_on=df1[&#39;PATH&#39;], right_on=df2[&#39;F_NAME&#39;], how=&#39;left&#39;)
print(df3)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在 pandas 中将两个 DataFrame 根据路径/文件名的尾部部分合并

问题

答案1

答案2

如何在Python3中编写自定义比较器和自定义排序以在sorted()函数中使用。

计算网格的顶点距离

找到在 Excel 中部门中重叠的员工。

如何在 tkinter 关闭窗口时停止工作线程？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论