2020年1月3日 14:48:27go评论233阅读模式

英文:

Map two data frames by common string elements from List column

问题

我想要将两个数据框映射起来，如果两列中的字符串元素匹配，我有一个共同的列，其中的字符串是用逗号分隔的。我尝试过使用映射函数，将其转换为字典，但没有成功。

df
Text
[Temp,Temp2]
[Temp4,Temp7,Temp2]

ClusterDf
Label             Member
[Cluster1]    [Temp,Temp8]
[Cluster2]   [Temp4,Temp7]

我希望的输出是

df
Text                     Label
[Temp,Temp2]             [Cluster1]
[Temp4,Temp7,Temp2]      [Cluster2]

英文:

I want to map two data frames if the string element from two columns match, The common column i have is string with comma separated. I tried map function by converting it to dictionary also. But it didn't worked.

df 
Text
[Temp,Temp2]
[Temp4,Temp7,Temp2]


ClusterDf
Label             Member
[Cluster1]    [Temp,Temp8]
[Cluster2]   [Temp4,Temp7]

I want output like

df 
Text                     Label  
[Temp,Temp2]             [Cluster1]  
[Temp4,Temp7,Temp2]      [Cluster2]

答案1

得分: 1

根据ClusterDf创建字典，然后使用map添加新列，如果没有匹配项则迭代：

d = {v: a[0] for a, b in zip(ClusterDf['Label'], ClusterDf['Member']) for v in b}
print (d)
{'Temp': 'Cluster1', 'Temp8': 'Cluster1', 'Temp4': 'Cluster2', 'Temp7': 'Cluster2'}

df['Label'] = df['Text'].map(lambda x: next(iter(d[y] for y in x if y in d), 'no match'))
print (df)
                    Text     Label
0          [Temp, Temp2]  Cluster1
1  [Temp4, Temp7, Temp2]  Cluster2

如果需要列表：

df['Label'] = df['Text'].map(lambda x: [next(iter(d[y] for y in x if y in d), 'no match')])
print (df)
                    Text       Label
0          [Temp, Temp2]  [Cluster1]
1  [Temp4, Temp7, Temp2]  [Cluster2]

如果希望存在所有匹配项：

df['Label'] = df['Text'].map(lambda x: [d[y] for y in x if y in d])
print (df)
                    Text                 Label
0          [Temp, Temp2]            [Cluster1]
1  [Temp4, Temp7, Temp2]  [Cluster2, Cluster2]

英文:

Create dictionary by ClusterDf and then add new column by map with next and iter if no match:

d = {v: a[0] for a, b in zip(ClusterDf[&#39;Label&#39;], ClusterDf[&#39;Member&#39;]) for v in b}
print (d)
{&#39;Temp&#39;: &#39;Cluster1&#39;, &#39;Temp8&#39;: &#39;Cluster1&#39;, &#39;Temp4&#39;: &#39;Cluster2&#39;, &#39;Temp7&#39;: &#39;Cluster2&#39;}

df[&#39;Label&#39;] = df[&#39;Text&#39;].map(lambda x: next(iter(d[y] for y in x if y in d), &#39;no match&#39;))
print (df)
                    Text     Label
0          [Temp, Temp2]  Cluster1
1  [Temp4, Temp7, Temp2]  Cluster2

If need list:

df[&#39;Label&#39;] = df[&#39;Text&#39;].map(lambda x: [next(iter(d[y] for y in x if y in d), &#39;no match&#39;)])
print (df)
                    Text       Label
0          [Temp, Temp2]  [Cluster1]
1  [Temp4, Temp7, Temp2]  [Cluster2]

If want all matching if exist:

df[&#39;Label&#39;] = df[&#39;Text&#39;].map(lambda x: [d[y] for y in x if y in d])
print (df)
                    Text                 Label
0          [Temp, Temp2]            [Cluster1]
1  [Temp4, Temp7, Temp2]  [Cluster2, Cluster2]

答案2

得分: 0

感谢 @jezrael，第三种解决方案对我非常有效。非常感谢。
你让我的一天变得美好。

df['Label'] = df['Text'].map(lambda x: [d[y] for y in x if y in d])

英文:

Thanks @jezrael, Third solution worked for me perfectly. Thanks a lot.
You made my day

df['Label'] = df['Text'].map(lambda x: [d[y] for y in x if y in d])

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将两个数据框通过列表列中的共同字符串元素进行映射。

问题

答案1

答案2

使用地图来存储具有用户定义类型的集合属性

检索符合条件的最后一行数据。

如何使用多列和条件像PySpark一样连接Pandas数据框。

网页抓取数据的格式化 BS4

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论