英文:
Filter column values in DataFrame based on if value contains a substring from list
问题
我有两个数据框,我想要查看数据框#1中特定列中的哪些值具有与数据框#2中对应列中的值相等的子字符串。
data = {
'id': ['TEST-123','WORD-456']
}
data2 = {
'id':['123','456']
}
df1 = pd.DataFrame(data)
df2 = pd.DataFrame(data2)
我尝试使用以下代码:
df1 = df1[df1['id'].str.contains([i for i in df2.tolist()])]
但遇到了一个'TypeError: unhashable type: 'list''错误。
在这个示例中,我期望的数据框将保持不变,因为'TEST-123'具有来自df2的子字符串'123','WORD-456'具有来自df2的子字符串'456'。
英文:
I have two dataframes and I would like to see which values in a specific column from dataframe #1 have substrings that are equal to the values in a corresponding column in dataframe #2.
data = {
'id': ['TEST-123','WORD-456']
}
data2 = {
'id':['123','456']
}
df1 = pd.DataFrame(data)
df2 = pd.DataFrame(data2)
I tried using
df1 = df1[df1['id'].str.contains([i for i in df2.tolist()])]
but was met with a 'TypeError: unhashable type: 'list'' error.
My expected dataframe in this example would be df1 left unchanged because 'TEST-123' has the substring '123' from df2 and 'WORD-456' has the substring '456' from df2.
答案1
得分: 1
你可以创建一个正则表达式,然后在str.contains
中使用它:
import re
mask = df1['id'].str.contains(df2['id'].map(re.escape).str.cat(sep='|'), regex=True)
输出:
>>> df1[mask]
id
0 TEST-123
1 WORD-456
>>> mask
0 True
1 True
Name: id, dtype: bool
>>> df2['id'].map(re.escape).str.cat(sep='|')
'123|456'
注意,str.contains
期望一个字符串而不是一个字符串列表。
英文:
You can create a regex to use with str.contains
:
import re
mask = df1['id'].str.contains(df2['id'].map(re.escape).str.cat(sep='|'), regex=True)
Output:
>>> df1[mask]
id
0 TEST-123
1 WORD-456
>>> mask
0 True
1 True
Name: id, dtype: bool
>>> df2['id'].map(re.escape).str.cat(sep='|')
'123|456'
Note, str.contains
expects a string not a list of string.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论