如何检查一个列中的值是否存在于另一个列中,当查询的列有许多值时?

huangapple go评论66阅读模式
英文:

How to check if a value in one column is in other column when the queried column have many values?

问题

0    True
1    False
2    True
3    True
4    True
5    False
6    False
7    False
Name: variant, dtype: bool
英文:

Question

How to check if a value in one column is in other column when the queried column have many values?

The minimal reproducible example

df1 = pd.DataFrame({'patient': ['patient1', 'patient1', 'patient1','patient2', 'patient2', 'patient3','patient3','patient4'], 
                   'gene':['TYR','TYR','TYR','TYR','TYR','TYR','TYR','TYR'],
                   'variant': ['buu', 'luu', 'stm','lol', 'bla', 'buu', 'lol','buu'],
                    'genotype': ['buu,luu,hola', 'gulu,melon', 'melon,stm','melon,buu,lol', 'bla', 'het', 'het','het']})

print(df1)

    patient gene variant       genotype
0  patient1  TYR     buu   buu,luu,hola
1  patient1  TYR     luu     gulu,melon
2  patient1  TYR     stm      melon,stm
3  patient2  TYR     lol  melon,buu,lol
4  patient2  TYR     bla            bla
5  patient3  TYR     buu            het
6  patient3  TYR     lol            het
7  patient4  TYR     buu            het

What I have tried

df1.variant.isin(df1.genotype)

0    False
1    False
2    False
3    False
4     True
5    False
6    False
7    False
Name: variant, dtype: bool

This does not work. The expected result would be:

0    True
1    False
2    True
3    True
4    True
5    False
6    False
7    False
Name: variant, dtype: bool

I don't know how many different values the column genotype has. This vary a lot from 1 to 20

答案1

得分: 1

你可以使用 DataFrame.apply + str.split

print(df1.apply(lambda x: x['variant'] in x['genotype'].split(','), axis=1))

打印结果:

0     True
1    False
2     True
3     True
4     True
5    False
6    False
7    False
dtype: bool
英文:

You can use DataFrame.apply + str.split:

print(df1.apply(lambda x: x['variant'] in x['genotype'].split(','), axis=1))

Prints:

0     True
1    False
2     True
3     True
4     True
5    False
6    False
7    False
dtype: bool

答案2

得分: 1

With a listcomp:

[var in gen for var, gen in zip(df1["variant"], df1["genotype"])]

Output:

# with the Series constructor pd.Series(...)

0     True
1    False
2     True
3     True
4     True
5    False
6    False
7    False
dtype: bool
英文:

With a listcomp :

[var in gen for var,gen in zip(df1["variant"], df1["genotype"])]

Output :

# with the Series constructor pd.Series(...)

0     True
1    False
2     True
3     True
4     True
5    False
6    False
7    False
dtype: bool

答案3

得分: 0

你需要创建一个简单的 function 并使用 apply 来迭代所有行。

def check_variant(df):
    return True if df['genotype'].find(df['variant']) != -1 else False

和触发:

df1.apply(check_variant, axis=1)

结果:

0     True
1    False
2     True
3     True
4     True
5    False
6    False
7    False
英文:

You need to create a simple function and use apply to iterate it through all rows.

def check_variant(df):
    
    return True if df['genotype'].find(df['variant']) != -1 else False

and Trigger:

df1.apply(check_variant, axis=1)

Results:

0     True
1    False
2     True
3     True
4     True
5    False
6    False

7 False

huangapple
  • 本文由 发表于 2023年4月11日 01:29:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75979268.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定