Pandas 左连接与重复项

huangapple go评论62阅读模式
英文:

Pandas left join with duplicates

问题

你可以尝试使用merge方法时指定left_onright_on参数来实现你的需求,如下所示:

result = A.merge(B, how='left', left_on=['Name', 'Value'], right_on=['Name', 'Value1'])

这将确保第一个'A'仅与第一个'A'匹配,第二个与第二个,第三个与第三个,得到你期望的结果:

  Name  Value  Value1
0    A      1       1
1    A      2       4
2    A      3       6

这种方法允许你根据多列进行合并,以实现你想要的匹配方式。

英文:

I have a pandas data frame like

A = pd.DataFrame({'Name' : ['A', 'A','A'], 'Value' : [1,2,3]})
and another DataFrame 
B = pd.DataFrame({'Name': ['A', 'C', 'D', 'A', 'E', 'A'], 'Value1' :[1,2,3,4,5,6]})

When I merge these I get

A.merge(B, how='left', on='Name')
In [4]: A.merge(B, how='left', on='Name')
Out[4]: 
  Name  Value  Value1
0    A      1       1
1    A      1       4
2    A      1       6
3    A      2       1
4    A      2       4
5    A      2       6
6    A      3       1
7    A      3       4
8    A      3       6

Anyway to do this merge in a way such that first row with 'A' will match only with first row with 'A' in B, and second with second and third with third.
Final output like

  Name  Value  Value1
0    A      1       1
1    A      2       4
2    A      3       6

Thanks,

I tried doing left merge. I wasnt expecting anything different, but I am looking for a better way to do this.

Doing Inner join doesnt help either

A.merge(B, how='inner', on='Name')

  Name  Value  Value1
0    A      1       1
1    A      1       4
2    A      1       6
3    A      2       1
4    A      2       4
5    A      2       6
6    A      3       1
7    A      3       4
8    A      3       6

答案1

得分: 7

使用groupby.cumcount进行去重,并将其作为次要键传递给merge

A.merge(B, how='left',
        left_on=['Name', A.groupby('Name').cumcount()],
        right_on=['Name', B.groupby('Name').cumcount()]
       )#.drop(columns='key_1')

输出:

  Name  key_1  Value  Value1
0    A      0      1       1
1    A      1      2       4
2    A      2      3       6
英文:

Deduplicate with groupby.cumcount and pass it to merge as secondary key:

A.merge(B, how='left',
        left_on=['Name', A.groupby('Name').cumcount()],
        right_on=['Name', B.groupby('Name').cumcount()]
       )#.drop(columns='key_1')

Output:

  Name  key_1  Value  Value1
0    A      0      1       1
1    A      1      2       4
2    A      2      3       6

答案2

得分: 0

你正在请求的实际上不是一个连接操作。

但是,你可以像这样操作:

pd.concat([A, B[B.Name == "A"].reset_index().Value1], axis=1)
英文:

You are requesting something that is not actually a join.

You can do something like this however:

pd.concat([A, B[B.Name == "A"].reset_index().Value1], axis=1)

huangapple
  • 本文由 发表于 2023年2月14日 05:51:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/75441542.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定