英文:
How to get specific phrases from data frame in python?
问题
I want to print the phrases that have "no" in them.
# Filter rows containing the word 'no'
nomore = copy.str.contains(r'\bno\b', na=False)
# Initialize a list to store the extracted phrases
phrases = []
# Loop through the rows that contain 'no'
for i in copy.loc[nomore]:
# Find the index of 'no' in each split string
no_indices = [idx for idx, word in enumerate(i) if word == 'no']
# Extract phrases that come after 'no'
for idx in no_indices:
if idx < len(i) - 1:
phrases.append(f'{i[idx]} {i[idx + 1]}')
# Print the extracted phrases
for phrase in phrases:
print(phrase)
This code should help you extract and print the phrases that come after the word "no" in your dataset.
英文:
I filtered a column in my data frame that contains the word "no". I want to print the phrases that have "no" in them.
For instance, if this is my dataset:
index | Column 1
------------------------------------------------------------------------
0 | no school for the rest of the year. no homework and no classes
1 | no more worries. no stress and no more anxiety
2 | no teachers telling us what to do
I want to get words/phrases that come after the word "no". As you can see, the word "no" occurs more than 1 time in some strings. I'd want my output to be
no school
no homework
no classes
no more worries
no stress
no more anxiety
no teachers
This is my code so far :
#make a copy of the column I'd like to filter
copy = df4['phrases'].copy()
#find rows that contain the word 'no'
nomore = copy.str.contains(r'\bno\b',na=False)
#split words in each string
copy.loc[nomore] = copy[nomore].str.split()
I'm not sure how to join the phrases. I've tried:
for i in copy.loc[nomore]:
for x in i:
if x == 'no':
print(x,x+1)
But this does not work. It does not recognize if x == 'no'
and it gives and error with x+1
.
How can I fix this?
Thank you for taking the time to read my post and assist in any way that you can. I really appreciate it.
答案1
得分: 1
这是一种使用 str.findall()
和 explode()
的方法:
df['col'].str.findall(r'no (?:more )?\w+').explode().tolist()
输出:
['no school',
'no homework',
'no classes',
'no more worries',
'no stress',
'no more anxiety',
'no teachers']
英文:
Here is a way with str.findall()
and explode()
df['col'].str.findall(r'no (?:more )?\w+').explode().tolist()
Output:
['no school',
'no homework',
'no classes',
'no more worries',
'no stress',
'no more anxiety',
'no teachers']
答案2
得分: 0
你可以使用str.extractall
获取所有“no”短语的列表,匹配以no
开头,后跟可选的“more”和一个单词,然后将该结果转换为列表:
df['phrases'].str.extractall(r'\b(no(?:\s+more)?\s+[a-zA-Z]+)')[0].to_list()
输出:
[
'no school',
'no homework',
'no classes',
'no more worries',
'no stress',
'no more anxiety',
'no teachers'
]
然后,你可以按照你的需求处理这个列表(例如,使用print
)。
英文:
You can get a list of all the "no" phrases using str.extractall
, matching no
followed by an optional "more" and a word, and then converting that result to a list:
df['phrases'].str.extractall(r'\b(no(?:\s+more)?\s+[a-zA-Z]+)')[0].to_list()
Output:
[
'no school',
'no homework',
'no classes',
'no more worries',
'no stress',
'no more anxiety',
'no teachers'
]
You can then process the list (e.g. print
) as you desire.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论