2023年2月8日 13:53:43go评论99阅读模式

英文:

Pandas extract phrases in string that occur in a list

问题

可以在不使用正则表达式的情况下实现这个功能。你可以使用Python的pandas库来处理数据框和列表。以下是一个示例代码：

import pandas as pd
# 创建数据框
data = {'text': ['my name is abc', 'xyz is a fruit', 'abc likes per']}
df = pd.DataFrame(data)
# 创建列表
phrases = ['abc', 'fruit', 'likes per']
# 定义一个函数来查找匹配的短语
def find_matching_phrases(text):
    matching_phrases = [phrase for phrase in phrases if phrase in text]
    return matching_phrases
# 将匹配的短语添加到新列
df['terms'] = df['text'].apply(find_matching_phrases)
# 打印结果
print(df)

这将为你的数据框添加一个名为'terms'的新列，其中包含与列表中短语匹配的值。这个例子中的输出将如你所描述的一样。

英文:

I have a data frame with a column text which has strings as shown below

text
my name is abc
xyz is a fruit
abc likes per

I also have a list of phrases as shown below

[&#39;abc&#39;, &#39;fruit&#39;, &#39;likes per&#39;]

I want to add a column terms to my data frame which contains those phrases in the list that occur in the text string, so result in this case would be

text                terms
my name is abc      [&#39;abc&#39;]
xyz is a fruit      [&#39;fruit&#39;]
abc likes per       [&#39;abc&#39;, &#39;likes per&#39;]

Can I do this without using regex?

答案1

得分: 1

使用Series.str.findall和正则表达式词边界\b\b：

df = pd.DataFrame({"text": ["my name is abcd", "xyz is a fruit", "abc likes per"]})
L = ['abc', 'fruit', 'likes per']
pat = '|'.join(r"\\b{}\\b".format(x) for x in L)
df['terms'] = df['text'].str.findall(pat)

如果词边界不重要：

df['terms1'] = df['text'].str.findall('|'.join(L))
print(df)

输出结果：

                  text             terms            terms1
0  my name is abcd                []             [abc]
1   xyz is a fruit           [fruit]           [fruit]
2    abc likes per  [abc, likes per]  [abc, likes per]

英文:

Use Series.str.findall with regex word boundaries \b\b:

df = pd.DataFrame({&quot;text&quot;: [&quot;my name is abcd&quot;, &quot;xyz is a fruit&quot;, &quot;abc likes per&quot;]})
L = [&#39;abc&#39;, &#39;fruit&#39;, &#39;likes per&#39;]
pat = &#39;|&#39;.join(r&quot;\b{}\b&quot;.format(x) for x in L)
df[&#39;terms&#39;] = df[&#39;text&#39;].str.findall(pat)

If word boundaries are not important:

df[&#39;terms1&#39;] = df[&#39;text&#39;].str.findall(&#39;|&#39;.join(L))
print (df)
              text             terms            terms1
0  my name is abcd                []             [abc]
1   xyz is a fruit           [fruit]           [fruit]
2    abc likes per  [abc, likes per]  [abc, likes per]

答案2

得分: 1

import pandas as pd
df = pd.DataFrame(data={
"text": ["my name is abc", "xyz is a fruit", "abc likes per"]
})
lst = ['abc', 'fruit', 'likes per']
df['terms'] = df['text'].apply(lambda x: [i for i in lst if i in x])
df

英文:

I hope, this works for your solution use apply to check the condition if its present in the list.

import pandas as pd
df = pd.DataFrame(data={
    &quot;text&quot;: [&quot;my name is abc&quot;, &quot;xyz is a fruit&quot;, &quot;abc likes per&quot;]
})
lst = [&#39;abc&#39;, &#39;fruit&#39;, &#39;likes per&#39;]
df[&#39;terms&#39;] = df[&#39;text&#39;].apply(lambda x: [i for i in lst if i in x])
df

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas从字符串中提取在列表中出现的短语。

问题

答案1

答案2

如何在Power Apps中从SharePoint获取超链接的显示名称？

如何将数据框表格编码为 JSON。

从元组中获取除已知元素以外的值如何操作？

Comparing 2 excel files to extract rows based on a reference number in one file and copy them to a new file

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。