Pandas从字符串中提取在列表中出现的短语。

huangapple go评论60阅读模式
英文:

Pandas extract phrases in string that occur in a list

问题

可以在不使用正则表达式的情况下实现这个功能。你可以使用Python的pandas库来处理数据框和列表。以下是一个示例代码:

import pandas as pd

# 创建数据框
data = {'text': ['my name is abc', 'xyz is a fruit', 'abc likes per']}
df = pd.DataFrame(data)

# 创建列表
phrases = ['abc', 'fruit', 'likes per']

# 定义一个函数来查找匹配的短语
def find_matching_phrases(text):
    matching_phrases = [phrase for phrase in phrases if phrase in text]
    return matching_phrases

# 将匹配的短语添加到新列
df['terms'] = df['text'].apply(find_matching_phrases)

# 打印结果
print(df)

这将为你的数据框添加一个名为'terms'的新列,其中包含与列表中短语匹配的值。这个例子中的输出将如你所描述的一样。

英文:

I have a data frame with a column text which has strings as shown below

text
my name is abc
xyz is a fruit
abc likes per

I also have a list of phrases as shown below

['abc', 'fruit', 'likes per']

I want to add a column terms to my data frame which contains those phrases in the list that occur in the text string, so result in this case would be

text                terms
my name is abc      ['abc']
xyz is a fruit      ['fruit']
abc likes per       ['abc', 'likes per']

Can I do this without using regex?

答案1

得分: 1

使用Series.str.findall和正则表达式词边界\b\b

df = pd.DataFrame({"text": ["my name is abcd", "xyz is a fruit", "abc likes per"]})

L = ['abc', 'fruit', 'likes per']

pat = '|'.join(r"\\b{}\\b".format(x) for x in L)
df['terms'] = df['text'].str.findall(pat)

如果词边界不重要:

df['terms1'] = df['text'].str.findall('|'.join(L))
print(df)

输出结果:

                  text             terms            terms1
0  my name is abcd                []             [abc]
1   xyz is a fruit           [fruit]           [fruit]
2    abc likes per  [abc, likes per]  [abc, likes per]
英文:

Use Series.str.findall with regex word boundaries \b\b:

df = pd.DataFrame({"text": ["my name is abcd", "xyz is a fruit", "abc likes per"]})

L = ['abc', 'fruit', 'likes per']

pat = '|'.join(r"\b{}\b".format(x) for x in L)
df['terms'] = df['text'].str.findall(pat)

If word boundaries are not important:

df['terms1'] = df['text'].str.findall('|'.join(L))
print (df)
              text             terms            terms1
0  my name is abcd                []             [abc]
1   xyz is a fruit           [fruit]           [fruit]
2    abc likes per  [abc, likes per]  [abc, likes per]

答案2

得分: 1

import pandas as pd
df = pd.DataFrame(data={
"text": ["my name is abc", "xyz is a fruit", "abc likes per"]
})
lst = ['abc', 'fruit', 'likes per']
df['terms'] = df['text'].apply(lambda x: [i for i in lst if i in x])
df

英文:

I hope, this works for your solution use apply to check the condition if its present in the list.

import pandas as pd
df = pd.DataFrame(data={
    "text": ["my name is abc", "xyz is a fruit", "abc likes per"]
})
lst = ['abc', 'fruit', 'likes per']
df['terms'] = df['text'].apply(lambda x: [i for i in lst if i in x])
df

huangapple
  • 本文由 发表于 2023年2月8日 13:53:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75381833.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定