英文:
How to check if a value in one column is in other column when the queried column have many values?
问题
0 True
1 False
2 True
3 True
4 True
5 False
6 False
7 False
Name: variant, dtype: bool
英文:
Question
How to check if a value in one column is in other column when the queried column have many values?
The minimal reproducible example
df1 = pd.DataFrame({'patient': ['patient1', 'patient1', 'patient1','patient2', 'patient2', 'patient3','patient3','patient4'],
'gene':['TYR','TYR','TYR','TYR','TYR','TYR','TYR','TYR'],
'variant': ['buu', 'luu', 'stm','lol', 'bla', 'buu', 'lol','buu'],
'genotype': ['buu,luu,hola', 'gulu,melon', 'melon,stm','melon,buu,lol', 'bla', 'het', 'het','het']})
print(df1)
patient gene variant genotype
0 patient1 TYR buu buu,luu,hola
1 patient1 TYR luu gulu,melon
2 patient1 TYR stm melon,stm
3 patient2 TYR lol melon,buu,lol
4 patient2 TYR bla bla
5 patient3 TYR buu het
6 patient3 TYR lol het
7 patient4 TYR buu het
What I have tried
df1.variant.isin(df1.genotype)
0 False
1 False
2 False
3 False
4 True
5 False
6 False
7 False
Name: variant, dtype: bool
This does not work. The expected result would be:
0 True
1 False
2 True
3 True
4 True
5 False
6 False
7 False
Name: variant, dtype: bool
I don't know how many different values the column genotype has. This vary a lot from 1 to 20
答案1
得分: 1
你可以使用 DataFrame.apply
+ str.split
:
print(df1.apply(lambda x: x['variant'] in x['genotype'].split(','), axis=1))
打印结果:
0 True
1 False
2 True
3 True
4 True
5 False
6 False
7 False
dtype: bool
英文:
You can use DataFrame.apply
+ str.split
:
print(df1.apply(lambda x: x['variant'] in x['genotype'].split(','), axis=1))
Prints:
0 True
1 False
2 True
3 True
4 True
5 False
6 False
7 False
dtype: bool
答案2
得分: 1
With a listcomp:
[var in gen for var, gen in zip(df1["variant"], df1["genotype"])]
Output:
# with the Series constructor pd.Series(...)
0 True
1 False
2 True
3 True
4 True
5 False
6 False
7 False
dtype: bool
英文:
With a listcomp :
[var in gen for var,gen in zip(df1["variant"], df1["genotype"])]
Output :
# with the Series constructor pd.Series(...)
0 True
1 False
2 True
3 True
4 True
5 False
6 False
7 False
dtype: bool
答案3
得分: 0
你需要创建一个简单的 function
并使用 apply
来迭代所有行。
def check_variant(df):
return True if df['genotype'].find(df['variant']) != -1 else False
和触发:
df1.apply(check_variant, axis=1)
结果:
0 True
1 False
2 True
3 True
4 True
5 False
6 False
7 False
英文:
You need to create a simple function
and use apply
to iterate it through all rows.
def check_variant(df):
return True if df['genotype'].find(df['variant']) != -1 else False
and Trigger:
df1.apply(check_variant, axis=1)
Results:
0 True
1 False
2 True
3 True
4 True
5 False
6 False
7 False
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论