英文:
List Comprehension Using any() Creating Multiple Entries in Pandas
问题
I have a scenario where I have created a list of keywords, and I'm iterative over the rows of a dataframe to determine a column value if another column contains any words from my keyword list in it. Here is an example:
kwrds = ['dog', 'puppy', 'golden retriever']
df = pd.DataFrame({
'description': ['This is a puppy', 'This is a dog', 'This is a golden retriever type dog', 'This is a cat', 'this is a kitten'],
'name': ['Rufus', 'Dingo', 'Rascal', 'MewMew', 'Jingles'],
'species': []})
for i,r in df.iterrows():
if any([x in r['description'] for x in kwrds]):
df.at[i, 'species'] = 'Canine'
else:
df.at[i, 'species'] = 'Feline'
The looping itself seems to work fine, however I am running into an issue where sometimes the species column will end up with multiple entries like
CanineCanineCanineCanine
Where other times it will work fine.
From what I understand the list comprehension itself should only return a true or false value. It almost seems like the row is getting iterated over multiple times, but with the same index, so the entry is created over and over.
The problem I'm thinking with that thought though is that it is not happening for every row in the dataframe. Only some, and generally always towards the end of the dataframe.
I'm not even sure where to start on trying to diagnose this issue.
英文:
I have a scenario where I have created a list of keywords, and I'm iterative over the rows of a dataframe to determine a column value if another column contains any words from my keyword list in it. Here is an example:
kwrds = ['dog', 'puppy', 'golden retriever']
df = pd.DataFrame({
'description': ['This is a puppy', 'This is a dog', 'This is a golden retriever type dog', 'This is a cat', 'this is a kitten'],
'name': ['Rufus', 'Dingo', 'Rascal', 'MewMew', 'Jingles'],
'species': []})
for i,r in df.iterrows():
if any([x in r['description'] for x in kwrds]):
df.at[i, 'species'] = 'Canine'
else:
df.at[i, 'species'] = 'Feline'
The looping itself seems to work fine, however I am running into an issue where sometimes the species column will end up with multiple entries like
CanineCanineCanineCanine
Where other times it will work fine.
From what I understand the list comprehension itself should only return a true or false value. It almost seems like the row is getting iterated over multiple times, but with the same index, so the entry is created over and over.
The problem I'm thinking with that thought though is that it is not happening for every row in the dataframe. Only some, and generally always towards the end of the dataframe.
I'm not even sure where to start on trying to diagnose this issue.
答案1
得分: 1
df['Species'] = np.where(df.description.str.contains("|".join(kwrds)), '犬类', '猫类')
df
description name Species
0 This is a puppy Rufus 犬类
1 This is a dog Dingo 犬类
2 This is a golden retriever type dog Rascal 犬类
3 This is a cat MewMew 猫类
4 this is a kitten Jingles 猫类
英文:
df['Species'] = np.where(df.description.str.contains("|".join(kwrds)), 'Canine', 'Feline')
df
description name Species
0 This is a puppy Rufus Canine
1 This is a dog Dingo Canine
2 This is a golden retriever type dog Rascal Canine
3 This is a cat MewMew Feline
4 this is a kitten Jingles Feline
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论