在Pandas数据框中查找两个列表的组合。

huangapple go评论92阅读模式
英文:

Find combination of two list in Pandas dataframe

问题

I want to find the combination of the 2 lists in a data frame but without combinations inside the lists:

我想在数据框中找到两个列表的组合,但不包括列表内部的组合:

With these combinations I want to check whether they are inside two columns of a data frame and if yes, extract the rows:

我想使用这些组合来检查它们是否在数据框的两列中,如果是的话,提取行:

Return:

返回:

How can I extract the rows?

如何提取行?

英文:

I have two lists:

  1. List1:
  2. 123
  3. 456
  4. 789
  5. List2:
  6. 321
  7. 654
  8. 987

I want to find the combination of the 2 lists in a data frame but without combinations inside the lists:

  1. 123-321
  2. 123-654
  3. 123-987
  4. 456-321
  5. 456-654
  6. 456-987
  7. 789-321
  8. 789-654
  9. 789-987
  10. 321-123
  11. 321-456
  12. 321-789
  13. 654-123
  14. 654-456
  15. 654-789
  16. 987-123
  17. 987-456
  18. 987-789

With these combinations I want to check whether they are inside two columns of a data frame and if yes, extract the rows:

  1. A B Value
  2. 123 321 0.5
  3. 456 111 0.4
  4. 987 654 0.3
  5. Return:
  6. A B Value
  7. 123 321 0.5

How can I extract the rows?

答案1

得分: 1

你可以在这两个列表构建的两列之间进行交叉合并。然后,使用NumPy的广播功能检查这个合并数据帧中是否存在dfAB列。

  1. a = [123, 456, 789]
  2. b = [321, 654, 987]
  3. m = pd.DataFrame({'A': a}).merge(pd.DataFrame({'B': b}), how='cross').to_numpy()[:, None] == df[['A', 'B']].to_numpy()
  4. out = df[m.all(axis=-1).any(axis=0)]
  1. print(out)
  2. A B Value
  3. 0 123 321 0.5
英文:

You can do a cross merge between two columns constructed from the two lists. Then check the existence of df A, B columns in that merges dataframe with numpy broadcasting.

  1. a = [123, 456, 789]
  2. b = [321, 654, 987]
  3. m = pd.DataFrame({'A': a}).merge(pd.DataFrame({'B': b}), how='cross').to_numpy()[:, None] == df[['A', 'B']].to_numpy()
  4. out = df[m.all(axis=-1).any(axis=0)]
  1. print(out)
  2. A B Value
  3. 0 123 321 0.5

答案2

得分: 0

  1. import pandas as pd
  2. a = [123, 456, 789]
  3. b = [321, 654, 987]
  4. df = pd.DataFrame({'A': [123, 456, 987], 'B': [321, 111, 654], 'value': [0.5, 0.4, 0.3]})
  5. print(df[(df.A.isin(a) & df.B.isin(b) & ~df.A.isin(b) & ~df.B.isin(a)) | (df.A.isin(b) & df.B.isin(a) & ~df.A.isin(a) & ~df.B.isin(b))])
英文:
  1. import pandas as pd
  2. a = [123,456,789]
  3. b = [321, 654, 987]
  4. df = pd.DataFrame({'A': [123, 456, 987], 'B': [321,111,654], 'value': [0.5, 0.4, 0.3]
  5. })
  6. print(df[(df.A.isin(a) & df.B.isin(b) & ~df.A.isin(b) & ~df.B.isin(a)) | (df.A.isin(b) & df.B.isin(a) & ~df.A.isin(a) & ~df.B.isin(b))])

Returns:

  1. A B value
  2. 0 123 321 0.5

it works by using a boolean mask, that checks that either:

  • column A is in list a, but not in list b and column B is in list b but in in list a
  • or the other way around

huangapple
  • 本文由 发表于 2023年3月31日 16:49:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/75896565.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定