在Pandas数据框中查找两个列表的组合。

huangapple go评论69阅读模式
英文:

Find combination of two list in Pandas dataframe

问题

I want to find the combination of the 2 lists in a data frame but without combinations inside the lists:

我想在数据框中找到两个列表的组合,但不包括列表内部的组合:

With these combinations I want to check whether they are inside two columns of a data frame and if yes, extract the rows:

我想使用这些组合来检查它们是否在数据框的两列中,如果是的话,提取行:

Return:

返回:

How can I extract the rows?

如何提取行?

英文:

I have two lists:

List1:
123 
456 
789 

List2:
321
654
987

I want to find the combination of the 2 lists in a data frame but without combinations inside the lists:

123-321
123-654
123-987
456-321
456-654
456-987
789-321
789-654
789-987
321-123
321-456
321-789
654-123
654-456
654-789
987-123
987-456
987-789

With these combinations I want to check whether they are inside two columns of a data frame and if yes, extract the rows:

A	B	Value
123 321 0.5
456 111 0.4
987 654 0.3

Return:
A	B	Value
123 321 0.5

How can I extract the rows?

答案1

得分: 1

你可以在这两个列表构建的两列之间进行交叉合并。然后,使用NumPy的广播功能检查这个合并数据帧中是否存在dfAB列。

a = [123, 456, 789]
b = [321, 654, 987]

m = pd.DataFrame({'A': a}).merge(pd.DataFrame({'B': b}), how='cross').to_numpy()[:, None] == df[['A', 'B']].to_numpy()
out = df[m.all(axis=-1).any(axis=0)]
print(out)

     A    B  Value
0  123  321    0.5
英文:

You can do a cross merge between two columns constructed from the two lists. Then check the existence of df A, B columns in that merges dataframe with numpy broadcasting.

a = [123, 456, 789]
b = [321, 654, 987]

m = pd.DataFrame({'A': a}).merge(pd.DataFrame({'B': b}), how='cross').to_numpy()[:, None] == df[['A', 'B']].to_numpy()
out = df[m.all(axis=-1).any(axis=0)]
print(out)

     A    B  Value
0  123  321    0.5

答案2

得分: 0

import pandas as pd

a = [123, 456, 789]
b = [321, 654, 987]

df = pd.DataFrame({'A': [123, 456, 987], 'B': [321, 111, 654], 'value': [0.5, 0.4, 0.3]})

print(df[(df.A.isin(a) & df.B.isin(b) & ~df.A.isin(b) & ~df.B.isin(a)) | (df.A.isin(b) & df.B.isin(a) & ~df.A.isin(a) & ~df.B.isin(b))])
英文:
import pandas as pd

a = [123,456,789]
b = [321, 654, 987]

df = pd.DataFrame({'A': [123, 456, 987], 'B': [321,111,654], 'value': [0.5, 0.4, 0.3]
})


print(df[(df.A.isin(a) & df.B.isin(b) & ~df.A.isin(b) & ~df.B.isin(a)) | (df.A.isin(b) & df.B.isin(a) & ~df.A.isin(a) & ~df.B.isin(b))])

Returns:

     A    B  value
0  123  321    0.5

it works by using a boolean mask, that checks that either:

  • column A is in list a, but not in list b and column B is in list b but in in list a
  • or the other way around

huangapple
  • 本文由 发表于 2023年3月31日 16:49:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/75896565.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定