英文:
Map two data frames by common string elements from List column
问题
我想要将两个数据框映射起来,如果两列中的字符串元素匹配,我有一个共同的列,其中的字符串是用逗号分隔的。我尝试过使用映射函数,将其转换为字典,但没有成功。
df
Text
[Temp,Temp2]
[Temp4,Temp7,Temp2]
ClusterDf
Label Member
[Cluster1] [Temp,Temp8]
[Cluster2] [Temp4,Temp7]
我希望的输出是
df
Text Label
[Temp,Temp2] [Cluster1]
[Temp4,Temp7,Temp2] [Cluster2]
英文:
I want to map two data frames if the string element from two columns match, The common column i have is string with comma separated. I tried map function by converting it to dictionary also. But it didn't worked.
df
Text
[Temp,Temp2]
[Temp4,Temp7,Temp2]
ClusterDf
Label Member
[Cluster1] [Temp,Temp8]
[Cluster2] [Temp4,Temp7]
I want output like
df
Text Label
[Temp,Temp2] [Cluster1]
[Temp4,Temp7,Temp2] [Cluster2]
答案1
得分: 1
根据ClusterDf
创建字典,然后使用map
添加新列,如果没有匹配项则迭代:
d = {v: a[0] for a, b in zip(ClusterDf['Label'], ClusterDf['Member']) for v in b}
print (d)
{'Temp': 'Cluster1', 'Temp8': 'Cluster1', 'Temp4': 'Cluster2', 'Temp7': 'Cluster2'}
df['Label'] = df['Text'].map(lambda x: next(iter(d[y] for y in x if y in d), 'no match'))
print (df)
Text Label
0 [Temp, Temp2] Cluster1
1 [Temp4, Temp7, Temp2] Cluster2
如果需要列表:
df['Label'] = df['Text'].map(lambda x: [next(iter(d[y] for y in x if y in d), 'no match')])
print (df)
Text Label
0 [Temp, Temp2] [Cluster1]
1 [Temp4, Temp7, Temp2] [Cluster2]
如果希望存在所有匹配项:
df['Label'] = df['Text'].map(lambda x: [d[y] for y in x if y in d])
print (df)
Text Label
0 [Temp, Temp2] [Cluster1]
1 [Temp4, Temp7, Temp2] [Cluster2, Cluster2]
英文:
Create dictionary by ClusterDf
and then add new column by map
with next
and iter if no match:
d = {v: a[0] for a, b in zip(ClusterDf['Label'], ClusterDf['Member']) for v in b}
print (d)
{'Temp': 'Cluster1', 'Temp8': 'Cluster1', 'Temp4': 'Cluster2', 'Temp7': 'Cluster2'}
df['Label'] = df['Text'].map(lambda x: next(iter(d[y] for y in x if y in d), 'no match'))
print (df)
Text Label
0 [Temp, Temp2] Cluster1
1 [Temp4, Temp7, Temp2] Cluster2
If need list:
df['Label'] = df['Text'].map(lambda x: [next(iter(d[y] for y in x if y in d), 'no match')])
print (df)
Text Label
0 [Temp, Temp2] [Cluster1]
1 [Temp4, Temp7, Temp2] [Cluster2]
If want all matching if exist:
df['Label'] = df['Text'].map(lambda x: [d[y] for y in x if y in d])
print (df)
Text Label
0 [Temp, Temp2] [Cluster1]
1 [Temp4, Temp7, Temp2] [Cluster2, Cluster2]
答案2
得分: 0
感谢 @jezrael,第三种解决方案对我非常有效。非常感谢。
你让我的一天变得美好。
df['Label'] = df['Text'].map(lambda x: [d[y] for y in x if y in d])
英文:
Thanks @jezrael, Third solution worked for me perfectly. Thanks a lot.
You made my day
df['Label'] = df['Text'].map(lambda x: [d[y] for y in x if y in d])
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论