How can I use pandas.query() to check if a string exists in a list within the dataframe?

huangapple go评论60阅读模式
英文:

How can I use pandas.query() to check if a string exists in a list within the dataframe?

问题

以下是代码部分的中文翻译:

    col1
0   ['str1', 'str2']
1   ['str3', 'str4']
2   []
3   ['str2', 'str4']
4   ['str1', 'str3']
5   []

你可以使用以下的代码来构建一个等效于"'str3' in col1"的df.query()字符串,以返回相应结果:

df.query("col1.apply(lambda x: 'str3' in x)")

这将返回以下结果:

    col1
1   ['str3', 'str4']
4   ['str1', 'str3']

这个方法会处理空列表,不会导致NaN值的问题。

英文:

Let's say I have a dataframe that looks like:

    col1
0   ['str1', 'str2']
1   ['str3', 'str4']
2   []
3   ['str2', 'str4']
4   ['str1', 'str3']
5   []

I'm trying to craft a df.query() string that would be the equivalent of saying "'str3' in col1".
So it would return:

    col1
1   ['str3', 'str4']
4   ['str1', 'str3']

I've tried df.query("col1.str.contains('str3')") but that results in

"None of [Float64Index([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n ...\n nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],\n dtype='float64', length=652)] are in the [index]"

I'm guessing because many of the lists in this column may be empty and convert to nan floats instead of strings?

It's highly preferable that I use query strings for this since rather than list constructors, since I want this to be a script where other users can filter a dataframe using these query strings that they may craft.

答案1

得分: 1

如果你有一系列的列表,你不能使用 query,而是应该使用列表推导式进行布尔索引。

df[["str3" in x for x in df['col1']]]

如果需要链式操作,可以使用 loc 与 lambda 表达式:

df.loc[lambda d: ["str3" in x for x in d['col1']]]

输出:

           col1
1  [str3, str4]
4  [str1, str3]
英文:

If you have a Series of lists, you can't use query, instead go for boolean indexing with a list comprehension.

df[["str3" in x for x in df['col1']]]

If you need to chain the command, use loc with a lambda:

df.loc[lambda d: ["str3" in x for x in d['col1']]]

Output:

           col1
1  [str3, str4]
4  [str1, str3]

答案2

得分: 0

可以在您的设置中使用lambda函数吗?因为似乎可以使用掩码来过滤数据:

df['col'].apply(lambda x: 'str3' in x)

英文:

Is it possible in your set up to use lambda functions? Because it seems that data could be filtered with mask:

df['col'].apply(labmda x: 'str3' in x)

huangapple
  • 本文由 发表于 2023年4月7日 00:27:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/75951737.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定