如何从Python数据框中获取特定短语?

huangapple go评论81阅读模式
英文:

How to get specific phrases from data frame in python?

问题

I want to print the phrases that have "no" in them.

# Filter rows containing the word 'no'
nomore = copy.str.contains(r'\bno\b', na=False)

# Initialize a list to store the extracted phrases
phrases = []

# Loop through the rows that contain 'no'
for i in copy.loc[nomore]:
    # Find the index of 'no' in each split string
    no_indices = [idx for idx, word in enumerate(i) if word == 'no']
    
    # Extract phrases that come after 'no'
    for idx in no_indices:
        if idx < len(i) - 1:
            phrases.append(f'{i[idx]} {i[idx + 1]}')

# Print the extracted phrases
for phrase in phrases:
    print(phrase)

This code should help you extract and print the phrases that come after the word "no" in your dataset.

英文:

I filtered a column in my data frame that contains the word "no". I want to print the phrases that have "no" in them.

For instance, if this is my dataset:

index | Column 1
------------------------------------------------------------------------ 
  0   | no school for the rest of the year. no homework and no classes
  1   | no more worries. no stress and no more anxiety
  2   | no teachers telling us what to do

I want to get words/phrases that come after the word "no". As you can see, the word "no" occurs more than 1 time in some strings. I'd want my output to be

no school
no homework
no classes
no more worries
no stress
no more anxiety
no teachers

This is my code so far :

#make a copy of the column I&#39;d like to filter
copy = df4[&#39;phrases&#39;].copy()

#find rows that contain the word &#39;no&#39;
nomore = copy.str.contains(r&#39;\bno\b&#39;,na=False)

#split words in each string
copy.loc[nomore] = copy[nomore].str.split()

I'm not sure how to join the phrases. I've tried:

for i in  copy.loc[nomore]:
    for x in i: 
        if x == &#39;no&#39;:
            print(x,x+1)

But this does not work. It does not recognize if x == &#39;no&#39; and it gives and error with x+1.

How can I fix this?

Thank you for taking the time to read my post and assist in any way that you can. I really appreciate it.

答案1

得分: 1

这是一种使用 str.findall()explode() 的方法:

df['col'].str.findall(r'no (?:more )?\w+').explode().tolist()

输出:

['no school',
 'no homework',
 'no classes',
 'no more worries',
 'no stress',
 'no more anxiety',
 'no teachers']
英文:

Here is a way with str.findall() and explode()

df[&#39;col&#39;].str.findall(r&#39;no (?:more )?\w+&#39;).explode().tolist()

Output:

[&#39;no school&#39;,
 &#39;no homework&#39;,
 &#39;no classes&#39;,
 &#39;no more worries&#39;,
 &#39;no stress&#39;,
 &#39;no more anxiety&#39;,
 &#39;no teachers&#39;]

答案2

得分: 0

你可以使用str.extractall获取所有“no”短语的列表,匹配以no开头,后跟可选的“more”和一个单词,然后将该结果转换为列表:

df['phrases'].str.extractall(r'\b(no(?:\s+more)?\s+[a-zA-Z]+)')[0].to_list()

输出:

[
 'no school',
 'no homework',
 'no classes',
 'no more worries',
 'no stress',
 'no more anxiety',
 'no teachers'
]

然后,你可以按照你的需求处理这个列表(例如,使用print)。

英文:

You can get a list of all the "no" phrases using str.extractall, matching no followed by an optional "more" and a word, and then converting that result to a list:

df[&#39;phrases&#39;].str.extractall(r&#39;\b(no(?:\s+more)?\s+[a-zA-Z]+)&#39;)[0].to_list()

Output:

[
 &#39;no school&#39;,
 &#39;no homework&#39;,
 &#39;no classes&#39;,
 &#39;no more worries&#39;,
 &#39;no stress&#39;,
 &#39;no more anxiety&#39;,
 &#39;no teachers&#39;
]

You can then process the list (e.g. print) as you desire.

huangapple
  • 本文由 发表于 2023年5月26日 09:04:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76337052.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定