如何检查一个列中的值是否存在于另一个列中,当查询的列有许多值时?

huangapple go评论93阅读模式
英文:

How to check if a value in one column is in other column when the queried column have many values?

问题

  1. 0 True
  2. 1 False
  3. 2 True
  4. 3 True
  5. 4 True
  6. 5 False
  7. 6 False
  8. 7 False
  9. Name: variant, dtype: bool
英文:

Question

How to check if a value in one column is in other column when the queried column have many values?

The minimal reproducible example

  1. df1 = pd.DataFrame({'patient': ['patient1', 'patient1', 'patient1','patient2', 'patient2', 'patient3','patient3','patient4'],
  2. 'gene':['TYR','TYR','TYR','TYR','TYR','TYR','TYR','TYR'],
  3. 'variant': ['buu', 'luu', 'stm','lol', 'bla', 'buu', 'lol','buu'],
  4. 'genotype': ['buu,luu,hola', 'gulu,melon', 'melon,stm','melon,buu,lol', 'bla', 'het', 'het','het']})
  5. print(df1)
  6. patient gene variant genotype
  7. 0 patient1 TYR buu buu,luu,hola
  8. 1 patient1 TYR luu gulu,melon
  9. 2 patient1 TYR stm melon,stm
  10. 3 patient2 TYR lol melon,buu,lol
  11. 4 patient2 TYR bla bla
  12. 5 patient3 TYR buu het
  13. 6 patient3 TYR lol het
  14. 7 patient4 TYR buu het

What I have tried

  1. df1.variant.isin(df1.genotype)
  2. 0 False
  3. 1 False
  4. 2 False
  5. 3 False
  6. 4 True
  7. 5 False
  8. 6 False
  9. 7 False
  10. Name: variant, dtype: bool

This does not work. The expected result would be:

  1. 0 True
  2. 1 False
  3. 2 True
  4. 3 True
  5. 4 True
  6. 5 False
  7. 6 False
  8. 7 False
  9. Name: variant, dtype: bool

I don't know how many different values the column genotype has. This vary a lot from 1 to 20

答案1

得分: 1

你可以使用 DataFrame.apply + str.split

  1. print(df1.apply(lambda x: x['variant'] in x['genotype'].split(','), axis=1))

打印结果:

  1. 0 True
  2. 1 False
  3. 2 True
  4. 3 True
  5. 4 True
  6. 5 False
  7. 6 False
  8. 7 False
  9. dtype: bool
英文:

You can use DataFrame.apply + str.split:

  1. print(df1.apply(lambda x: x['variant'] in x['genotype'].split(','), axis=1))

Prints:

  1. 0 True
  2. 1 False
  3. 2 True
  4. 3 True
  5. 4 True
  6. 5 False
  7. 6 False
  8. 7 False
  9. dtype: bool

答案2

得分: 1

With a listcomp:

  1. [var in gen for var, gen in zip(df1["variant"], df1["genotype"])]

Output:

  1. # with the Series constructor pd.Series(...)
  2. 0 True
  3. 1 False
  4. 2 True
  5. 3 True
  6. 4 True
  7. 5 False
  8. 6 False
  9. 7 False
  10. dtype: bool
英文:

With a listcomp :

  1. [var in gen for var,gen in zip(df1["variant"], df1["genotype"])]

Output :

  1. # with the Series constructor pd.Series(...)
  2. 0 True
  3. 1 False
  4. 2 True
  5. 3 True
  6. 4 True
  7. 5 False
  8. 6 False
  9. 7 False
  10. dtype: bool

答案3

得分: 0

你需要创建一个简单的 function 并使用 apply 来迭代所有行。

  1. def check_variant(df):
  2. return True if df['genotype'].find(df['variant']) != -1 else False

和触发:

  1. df1.apply(check_variant, axis=1)

结果:

  1. 0 True
  2. 1 False
  3. 2 True
  4. 3 True
  5. 4 True
  6. 5 False
  7. 6 False
  8. 7 False
英文:

You need to create a simple function and use apply to iterate it through all rows.

  1. def check_variant(df):
  2. return True if df['genotype'].find(df['variant']) != -1 else False

and Trigger:

  1. df1.apply(check_variant, axis=1)

Results:

  1. 0 True
  2. 1 False
  3. 2 True
  4. 3 True
  5. 4 True
  6. 5 False
  7. 6 False

7 False

huangapple
  • 本文由 发表于 2023年4月11日 01:29:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75979268.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定