2023年5月26日 09:04:39go评论115阅读模式

英文:

How to get specific phrases from data frame in python?

问题

I want to print the phrases that have "no" in them.

# Filter rows containing the word 'no'
nomore = copy.str.contains(r'\bno\b', na=False)
# Initialize a list to store the extracted phrases
phrases = []
# Loop through the rows that contain 'no'
for i in copy.loc[nomore]:
    # Find the index of 'no' in each split string
    no_indices = [idx for idx, word in enumerate(i) if word == 'no']
    
    # Extract phrases that come after 'no'
    for idx in no_indices:
        if idx < len(i) - 1:
            phrases.append(f'{i[idx]} {i[idx + 1]}')
# Print the extracted phrases
for phrase in phrases:
    print(phrase)

This code should help you extract and print the phrases that come after the word "no" in your dataset.

英文:

I filtered a column in my data frame that contains the word "no". I want to print the phrases that have "no" in them.

For instance, if this is my dataset:

index | Column 1
------------------------------------------------------------------------ 
  0   | no school for the rest of the year. no homework and no classes
  1   | no more worries. no stress and no more anxiety
  2   | no teachers telling us what to do

I want to get words/phrases that come after the word "no". As you can see, the word "no" occurs more than 1 time in some strings. I'd want my output to be

no school
no homework
no classes
no more worries
no stress
no more anxiety
no teachers

This is my code so far :

#make a copy of the column I&#39;d like to filter
copy = df4[&#39;phrases&#39;].copy()
#find rows that contain the word &#39;no&#39;
nomore = copy.str.contains(r&#39;\bno\b&#39;,na=False)
#split words in each string
copy.loc[nomore] = copy[nomore].str.split()

I'm not sure how to join the phrases. I've tried:

for i in  copy.loc[nomore]:
    for x in i: 
        if x == &#39;no&#39;:
            print(x,x+1)

But this does not work. It does not recognize if x == 'no' and it gives and error with x+1.

How can I fix this?

Thank you for taking the time to read my post and assist in any way that you can. I really appreciate it.

答案1

得分: 1

这是一种使用 str.findall() 和 explode() 的方法：

df['col'].str.findall(r'no (?:more )?\w+').explode().tolist()

输出：

['no school',
 'no homework',
 'no classes',
 'no more worries',
 'no stress',
 'no more anxiety',
 'no teachers']

英文:

Here is a way with str.findall() and explode()

df[&#39;col&#39;].str.findall(r&#39;no (?:more )?\w+&#39;).explode().tolist()

Output:

[&#39;no school&#39;,
 &#39;no homework&#39;,
 &#39;no classes&#39;,
 &#39;no more worries&#39;,
 &#39;no stress&#39;,
 &#39;no more anxiety&#39;,
 &#39;no teachers&#39;]

答案2

得分: 0

你可以使用str.extractall获取所有“no”短语的列表，匹配以no开头，后跟可选的“more”和一个单词，然后将该结果转换为列表：

df['phrases'].str.extractall(r'\b(no(?:\s+more)?\s+[a-zA-Z]+)')[0].to_list()

输出：

[
 'no school',
 'no homework',
 'no classes',
 'no more worries',
 'no stress',
 'no more anxiety',
 'no teachers'
]

然后，你可以按照你的需求处理这个列表（例如，使用print）。

英文:

You can get a list of all the "no" phrases using str.extractall, matching no followed by an optional "more" and a word, and then converting that result to a list:

df[&#39;phrases&#39;].str.extractall(r&#39;\b(no(?:\s+more)?\s+[a-zA-Z]+)&#39;)[0].to_list()

Output:

[
 &#39;no school&#39;,
 &#39;no homework&#39;,
 &#39;no classes&#39;,
 &#39;no more worries&#39;,
 &#39;no stress&#39;,
 &#39;no more anxiety&#39;,
 &#39;no teachers&#39;
]

You can then process the list (e.g. print) as you desire.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何从Python数据框中获取特定短语？

问题

答案1

答案2

Python错误: 导入库: ModuleNotFoundError: 找不到模块名 ‘InitProject’

让敌人射击玩家

我尝试安装Cryptodome时遇到以下错误日志：

如何在pandas DataFrame中获取每天的最早时间和最晚时间？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。