2023年6月8日 12:19:56go评论93阅读模式

英文:

Filter column values in DataFrame based on if value contains a substring from list

问题

我有两个数据框，我想要查看数据框＃1中特定列中的哪些值具有与数据框＃2中对应列中的值相等的子字符串。

data = {
  'id': ['TEST-123','WORD-456']
}
data2 = {
  'id':['123','456']
}
df1 = pd.DataFrame(data)
df2 = pd.DataFrame(data2)

我尝试使用以下代码：

df1 = df1[df1['id'].str.contains([i for i in df2.tolist()])]

但遇到了一个'TypeError: unhashable type: 'list''错误。

在这个示例中，我期望的数据框将保持不变，因为'TEST-123'具有来自df2的子字符串'123'，'WORD-456'具有来自df2的子字符串'456'。

英文:

I have two dataframes and I would like to see which values in a specific column from dataframe #1 have substrings that are equal to the values in a corresponding column in dataframe #2.

data = {
  &#39;id&#39;: [&#39;TEST-123&#39;,&#39;WORD-456&#39;]
}
data2 = {
  &#39;id&#39;:[&#39;123&#39;,&#39;456&#39;]
}
df1 = pd.DataFrame(data)
df2 = pd.DataFrame(data2)

I tried using

df1 = df1[df1[&#39;id&#39;].str.contains([i for i in df2.tolist()])]

but was met with a 'TypeError: unhashable type: 'list'' error.

My expected dataframe in this example would be df1 left unchanged because 'TEST-123' has the substring '123' from df2 and 'WORD-456' has the substring '456' from df2.

答案1

得分: 1

你可以创建一个正则表达式，然后在str.contains中使用它：

import re
mask = df1['id'].str.contains(df2['id'].map(re.escape).str.cat(sep='|'), regex=True)

输出:

>>> df1[mask]
         id
0  TEST-123
1  WORD-456
>>> mask
0    True
1    True
Name: id, dtype: bool
>>> df2['id'].map(re.escape).str.cat(sep='|')
'123|456'

注意，str.contains期望一个字符串而不是一个字符串列表。

英文:

You can create a regex to use with str.contains:

import re
mask = df1[&#39;id&#39;].str.contains(df2[&#39;id&#39;].map(re.escape).str.cat(sep=&#39;|&#39;), regex=True)

Output:

&gt;&gt;&gt; df1[mask]
         id
0  TEST-123
1  WORD-456
&gt;&gt;&gt; mask
0    True
1    True
Name: id, dtype: bool
&gt;&gt;&gt; df2[&#39;id&#39;].map(re.escape).str.cat(sep=&#39;|&#39;)
&#39;123|456&#39;

Note, str.contains expects a string not a list of string.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在DataFrame中根据值是否包含列表中的子字符串来筛选列值。

问题

答案1

使用Pillow在Tkinter中如何插入图像

适用于视频分类的正确输入形状，使用图像文件夹。

获取多个项目，根据任意查询 -> Python FastAPI

如何获取带有随机前缀和后缀的名称

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。