基于一个字符串聚合列合并两个DataFrame

huangapple go评论81阅读模式
英文:

Merge two DataFrames based on a string aggregated column

问题

我有以下包含的数据框(DF):

Lookup Definition
A Apple
B Banana
C Carrot

我还有另一个数据框(DF2):

SNo Lookup Values
1 ['A', 'B']
2 ['A', 'C']
3 ['B', 'C']

注意:"Lookup Values"列是一个字符串列表

执行JOIN操作以获取以下结果的最简单方法是什么:

SNo Lookup Values Lookup Definitions
1 A, B Apple, Banana
2 A, C Apple, Carrot
3 B, C Banana, Carrot
英文:

I have the below DF that contains

Lookup Definition
A Apple
B Banana
C Carrot

I have another DF2:

SNo Lookup Values
1 ['A', 'B']
2 ['A', 'C']
3 ['B', 'C']

Note: "Lookup Values" column is a list of strings

What is the most simple way of performing the JOIN to get the below:

SNo Lookup Values Lookup Definitions
1 A, B Apple, Banana
2 A, C Apple, Carrot
3 B, C Banana, Carrot

答案1

得分: 4

这是使用map()函数的一种方法:

(df2.explode('Lookup Values')
.dropna()
.assign(**{'Lookup Definitions': lambda x: x['Lookup Values'].map(dict(df.to_records(index=False)))})
.groupby('SNo')
.agg(', '.join)
.reset_index())

输出结果:

   SNo Lookup Values Lookup Definitions
0    1          A, B      Apple, Banana
1    2          A, C      Apple, Carrot
2    3          B, C     Banana, Carrot
英文:

Here is a way with using map()

(df2.explode('Lookup Values')
.dropna()
.assign(**{'Lookup Definitions':lambda x: x['Lookup Values'].map(dict(df.to_records(index = False)))})
.groupby('SNo')
.agg(', '.join)
.reset_index())

Output:

   SNo Lookup Values Lookup Definitions
0    1          A, B      Apple, Banana
1    2          A, C      Apple, Carrot
2    3          B, C     Banana, Carrot

答案2

得分: 3

尝试使用explode、merge和groupby函数:

(df2.explode('Lookup Values')
    .merge(df1, left_on='Lookup Values', right_on='Lookup', how='left')
    .groupby('SNo', as_index=False)
    .agg({'Lookup':','.join, 'Definition':','.join})
)

输出结果:

   SNo Lookup     Definition
0    1    A,B   Apple,Banana
1    2    B,C  Banana,Carrot
2    3    C,A   Carrot,Apple
英文:

Try explode, merge, then groupby:

(df2.explode('Lookup Values')
    .merge(df1, left_on='Lookup Values', right_on='Lookup', how='left')
    .groupby('SNo', as_index=False)
    .agg({'Lookup':','.join, 'Definition':','.join})
)

Output:

   SNo Lookup     Definition
0    1    A,B   Apple,Banana
1    2    B,C  Banana,Carrot
2    3    C,A   Carrot,Apple

答案3

得分: 3

  1. explode 列表列
  2. merge 获取定义
  3. 根据需要进行 groupbyagg
>>> df2.explode("Lookup Values").merge(df, left_on="Lookup Values", right_on="Lookup").groupby("SNo")[["Lookup Values", "Definition"]].agg(", ".join)

    Lookup Values      Definition
SNo                              
1            A, B   Apple, Banana
2            A, C   Apple, Carrot
3            B, C  Banana, Carrot
英文:
  1. explode the list columns
  2. merge to get the definitions
  3. groupby and agg as needed
>>> df2.explode("Lookup Values").merge(df, left_on="Lookup Values", right_on="Lookup").groupby("SNo")[["Lookup Values", "Definition"]].agg(", ".join)

    Lookup Values      Definition
SNo                              
1            A, B   Apple, Banana
2            A, C   Apple, Carrot
3            B, C  Banana, Carrot

答案4

得分: 2

另一种解决方案:

df2["查找定义"] = df2["查找值"].apply(
    lambda x, m=df1.set_index("查找")["定义"].to_dict(): ", ".join(
        m[v] for v in x if v in m
    )
)
print(df2)

输出结果:

   序号  查找值       查找定义
0   1  [A, B]  苹果, 香蕉
1   2  [A, C]  苹果, 胡萝卜
2   3  [B, C]  香蕉, 胡萝卜
英文:

Another solution:

df2["Lookup Definitions"] = df2["Lookup Values"].apply(
    lambda x, m=df1.set_index("Lookup")["Definition"].to_dict(): ", ".join(
        m[v] for v in x if v in m
    )
)
print(df2)

Prints:

   SNo Lookup Values Lookup Definitions
0    1        [A, B]      Apple, Banana
1    2        [A, C]      Apple, Carrot
2    3        [B, C]     Banana, Carrot

huangapple
  • 本文由 发表于 2023年8月9日 02:16:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76862214.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定