将两个数据框通过列表列中的共同字符串元素进行映射。

huangapple go评论131阅读模式
英文:

Map two data frames by common string elements from List column

问题

我想要将两个数据框映射起来,如果两列中的字符串元素匹配,我有一个共同的列,其中的字符串是用逗号分隔的。我尝试过使用映射函数,将其转换为字典,但没有成功。

  1. df
  2. Text
  3. [Temp,Temp2]
  4. [Temp4,Temp7,Temp2]
  5. ClusterDf
  6. Label Member
  7. [Cluster1] [Temp,Temp8]
  8. [Cluster2] [Temp4,Temp7]

我希望的输出是

  1. df
  2. Text Label
  3. [Temp,Temp2] [Cluster1]
  4. [Temp4,Temp7,Temp2] [Cluster2]
英文:

I want to map two data frames if the string element from two columns match, The common column i have is string with comma separated. I tried map function by converting it to dictionary also. But it didn't worked.

  1. df
  2. Text
  3. [Temp,Temp2]
  4. [Temp4,Temp7,Temp2]
  5. ClusterDf
  6. Label Member
  7. [Cluster1] [Temp,Temp8]
  8. [Cluster2] [Temp4,Temp7]

I want output like

  1. df
  2. Text Label
  3. [Temp,Temp2] [Cluster1]
  4. [Temp4,Temp7,Temp2] [Cluster2]

答案1

得分: 1

根据ClusterDf创建字典,然后使用map添加新列,如果没有匹配项则迭代:

  1. d = {v: a[0] for a, b in zip(ClusterDf['Label'], ClusterDf['Member']) for v in b}
  2. print (d)
  3. {'Temp': 'Cluster1', 'Temp8': 'Cluster1', 'Temp4': 'Cluster2', 'Temp7': 'Cluster2'}
  4. df['Label'] = df['Text'].map(lambda x: next(iter(d[y] for y in x if y in d), 'no match'))
  5. print (df)
  6. Text Label
  7. 0 [Temp, Temp2] Cluster1
  8. 1 [Temp4, Temp7, Temp2] Cluster2

如果需要列表:

  1. df['Label'] = df['Text'].map(lambda x: [next(iter(d[y] for y in x if y in d), 'no match')])
  2. print (df)
  3. Text Label
  4. 0 [Temp, Temp2] [Cluster1]
  5. 1 [Temp4, Temp7, Temp2] [Cluster2]

如果希望存在所有匹配项:

  1. df['Label'] = df['Text'].map(lambda x: [d[y] for y in x if y in d])
  2. print (df)
  3. Text Label
  4. 0 [Temp, Temp2] [Cluster1]
  5. 1 [Temp4, Temp7, Temp2] [Cluster2, Cluster2]
英文:

Create dictionary by ClusterDf and then add new column by map with next and iter if no match:

  1. d = {v: a[0] for a, b in zip(ClusterDf['Label'], ClusterDf['Member']) for v in b}
  2. print (d)
  3. {'Temp': 'Cluster1', 'Temp8': 'Cluster1', 'Temp4': 'Cluster2', 'Temp7': 'Cluster2'}
  4. df['Label'] = df['Text'].map(lambda x: next(iter(d[y] for y in x if y in d), 'no match'))
  5. print (df)
  6. Text Label
  7. 0 [Temp, Temp2] Cluster1
  8. 1 [Temp4, Temp7, Temp2] Cluster2

If need list:

  1. df['Label'] = df['Text'].map(lambda x: [next(iter(d[y] for y in x if y in d), 'no match')])
  2. print (df)
  3. Text Label
  4. 0 [Temp, Temp2] [Cluster1]
  5. 1 [Temp4, Temp7, Temp2] [Cluster2]

If want all matching if exist:

  1. df['Label'] = df['Text'].map(lambda x: [d[y] for y in x if y in d])
  2. print (df)
  3. Text Label
  4. 0 [Temp, Temp2] [Cluster1]
  5. 1 [Temp4, Temp7, Temp2] [Cluster2, Cluster2]

答案2

得分: 0

感谢 @jezrael,第三种解决方案对我非常有效。非常感谢。
你让我的一天变得美好。

df['Label'] = df['Text'].map(lambda x: [d[y] for y in x if y in d])

英文:

Thanks @jezrael, Third solution worked for me perfectly. Thanks a lot.
You made my day

df['Label'] = df['Text'].map(lambda x: [d[y] for y in x if y in d])

huangapple
  • 本文由 发表于 2020年1月3日 14:48:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/59574307.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定