在DataFrame中根据值是否包含列表中的子字符串来筛选列值。

huangapple go评论54阅读模式
英文:

Filter column values in DataFrame based on if value contains a substring from list

问题

我有两个数据框,我想要查看数据框#1中特定列中的哪些值具有与数据框#2中对应列中的值相等的子字符串。

data = {
  'id': ['TEST-123','WORD-456']
}

data2 = {
  'id':['123','456']
}

df1 = pd.DataFrame(data)
df2 = pd.DataFrame(data2)

我尝试使用以下代码:

df1 = df1[df1['id'].str.contains([i for i in df2.tolist()])]

但遇到了一个'TypeError: unhashable type: 'list''错误。

在这个示例中,我期望的数据框将保持不变,因为'TEST-123'具有来自df2的子字符串'123','WORD-456'具有来自df2的子字符串'456'。

英文:

I have two dataframes and I would like to see which values in a specific column from dataframe #1 have substrings that are equal to the values in a corresponding column in dataframe #2.

data = {
  'id': ['TEST-123','WORD-456']
}

data2 = {
  'id':['123','456']
}

df1 = pd.DataFrame(data)
df2 = pd.DataFrame(data2)

I tried using

df1 = df1[df1['id'].str.contains([i for i in df2.tolist()])]

but was met with a 'TypeError: unhashable type: 'list'' error.

My expected dataframe in this example would be df1 left unchanged because 'TEST-123' has the substring '123' from df2 and 'WORD-456' has the substring '456' from df2.

答案1

得分: 1

你可以创建一个正则表达式,然后在str.contains中使用它:

import re

mask = df1['id'].str.contains(df2['id'].map(re.escape).str.cat(sep='|'), regex=True)

输出:

>>> df1[mask]
         id
0  TEST-123
1  WORD-456

>>> mask
0    True
1    True
Name: id, dtype: bool

>>> df2['id'].map(re.escape).str.cat(sep='|')
'123|456'

注意,str.contains期望一个字符串而不是一个字符串列表。

英文:

You can create a regex to use with str.contains:

import re

mask = df1['id'].str.contains(df2['id'].map(re.escape).str.cat(sep='|'), regex=True)

Output:

>>> df1[mask]
         id
0  TEST-123
1  WORD-456

>>> mask
0    True
1    True
Name: id, dtype: bool

>>> df2['id'].map(re.escape).str.cat(sep='|')
'123|456'

Note, str.contains expects a string not a list of string.

huangapple
  • 本文由 发表于 2023年6月8日 12:19:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76428599.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定