英文:
How to join two pandas DataFrames on trailing part of path / filename
问题
你可以使用 Pandas 的 merge
方法来进行这种合并操作。以下是将 df2 合并到 df1 的 Pandas 合并语句:
df3 = df1.merge(df2, left_on='PATH', right_on='F_NAME', how='inner')
这会根据 'PATH' 列和 'F_NAME' 列的匹配进行内连接合并,产生与你描述的结果相似的 df3 数据帧。
英文:
I have two data Frames as follows.
df1 = pd.DataFrame({'PATH':[r'C:\FODLER\Test1.jpg',
r'C:\A\FODLER\Test2.jpg',
r'C:\A\FODLER\Test3.jpg',
r'C:\A\FODLER\Test4.jpg'],
'VALUE':[45,23,45,2]})
df2 = pd.DataFrame({'F_NAME': [r'FODLER\Test1.jpg',
r'FODLER\Test2.jpg',
r'FODLER\Test6.jpg',
r'FODLER\Test3.jpg',
r'FODLER\Test4.jpg',
r'FODLER\Test9.jpg'],
'VALUE_X': ['12', '25', '97', '33', '123', '0'],
'CORDS': ['1', '2', '3', '4', '5', '6']})
I want to join df2, where PATH.Contains(F_NAME) to df1 table.
so resulting data frame is as follows :
df3 = pd.DataFrame({'PATH':[r'C:\FODLER\Test1.jpg',
r'C:\A\FODLER\Test2.jpg',
r'C:\A\FODLER\Test3.jpg',
r'C:\A\FODLER\Test4.jpg'],
'F_NAME': [r'FODLER\Test1.jpg',
r'FODLER\Test2.jpg',
r'FODLER\Test3.jpg',
r'FODLER\Test4.jpg'],
'VALUE_X': ['12', '25', '33', '123'],
'CORDS': ['1', '2', '4', '5'],
'VALUE':[45,23,45,2]})
How do I write the pandas merge statement to do this joining?
答案1
得分: 2
You can use a merge
with a regex using str.extract
to extract the end-of-line anchored part of the path:
import re
pattern = f"({'|'.join(df2['F_NAME'].apply(re.escape))})$"
df3 = df1.merge(df2, left_on=df1['PATH'].str.extract(pattern, expand=False),
right_on='F_NAME', how='left')
Output:
PATH VALUE F_NAME VALUE_X CORDS
0 C:\FODLER\Test1.jpg 45 FODLER\Test1.jpg 12 1
1 C:\A\FODLER\Test2.jpg 23 FODLER\Test2.jpg 25 2
2 C:\A\FODLER\Test3.jpg 45 FODLER\Test3.jpg 33 4
3 C:\A\FODLER\Test4.jpg 2 FODLER\Test4.jpg 123 5
pattern
:
(FODLER\\Test1\.jpg|FODLER\\Test2\.jpg|FODLER\\Test6\.jpg|FODLER\\Test3\.jpg|FODLER\\Test4\.jpg|FODLER\\Test9\.jpg)$
Alternatively, if the PATH
only has 2 components (folder\filename.ext
), you can assign a column with the trailing part of the path before merging:
df3 = (df1
.assign(F_NAME=df1['PATH'].str.extract(r'([^\\]+\\[^\\]+)$', expand=False))
.merge(df2, how='left')
)
英文:
You can use a merge
with a regex using str.extract
to extract the end-of-line anchored part of the path:
import re
pattern = f"({'|'.join(df2['F_NAME'].apply(re.escape))})$"
df3 = df1.merge(df2, left_on=df1['PATH'].str.extract(pattern, expand=False),
right_on='F_NAME', how='left')
Output:
PATH VALUE F_NAME VALUE_X CORDS
0 C:\FODLER\Test1.jpg 45 FODLER\Test1.jpg 12 1
1 C:\A\FODLER\Test2.jpg 23 FODLER\Test2.jpg 25 2
2 C:\A\FODLER\Test3.jpg 45 FODLER\Test3.jpg 33 4
3 C:\A\FODLER\Test4.jpg 2 FODLER\Test4.jpg 123 5
pattern
:
(FODLER\\Test1\.jpg|FODLER\\Test2\.jpg|FODLER\\Test6\.jpg|FODLER\\Test3\.jpg|FODLER\\Test4\.jpg|FODLER\\Test9\.jpg)$
Alternatively, if the PATH
only has 2 components (folder\filename.ext
), you can assign a column with the trailing part of the path before merging:
df3 = (df1
.assign(F_NAME=df1['PATH'].str.extract(r'([^\\]+\\[^\\]+)$', expand=False))
.merge(df2, how='left')
)
答案2
得分: 1
可以尝试这个:
df3 = df1[df1['PATH'].str.contains('|'.join(df2['F_NAME']))].merge(df2, left_on=df1['PATH'], right_on=df2['F_NAME'], how='left')
print(df3)
英文:
Can you try this one:
df3 = df1[df1['PATH'].str.contains('|'.join(df2['F_NAME']))].merge(df2, left_on=df1['PATH'], right_on=df2['F_NAME'], how='left')
print(df3)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论