使用any()创建多个条目的列表理解在Pandas中。

huangapple go评论89阅读模式
英文:

List Comprehension Using any() Creating Multiple Entries in Pandas

问题

I have a scenario where I have created a list of keywords, and I'm iterative over the rows of a dataframe to determine a column value if another column contains any words from my keyword list in it. Here is an example:

kwrds = ['dog', 'puppy', 'golden retriever']

df = pd.DataFrame({
'description': ['This is a puppy', 'This is a dog', 'This is a golden retriever type dog', 'This is a cat', 'this is a kitten'],
'name': ['Rufus', 'Dingo', 'Rascal', 'MewMew', 'Jingles'],
'species': []})

for i,r in df.iterrows():
if any([x in r['description'] for x in kwrds]):
df.at[i, 'species'] = 'Canine'
else:
df.at[i, 'species'] = 'Feline'

The looping itself seems to work fine, however I am running into an issue where sometimes the species column will end up with multiple entries like

CanineCanineCanineCanine

Where other times it will work fine.

From what I understand the list comprehension itself should only return a true or false value. It almost seems like the row is getting iterated over multiple times, but with the same index, so the entry is created over and over.

The problem I'm thinking with that thought though is that it is not happening for every row in the dataframe. Only some, and generally always towards the end of the dataframe.

I'm not even sure where to start on trying to diagnose this issue.

英文:

I have a scenario where I have created a list of keywords, and I'm iterative over the rows of a dataframe to determine a column value if another column contains any words from my keyword list in it. Here is an example:

  1. kwrds = ['dog', 'puppy', 'golden retriever']
  2. df = pd.DataFrame({
  3. 'description': ['This is a puppy', 'This is a dog', 'This is a golden retriever type dog', 'This is a cat', 'this is a kitten'],
  4. 'name': ['Rufus', 'Dingo', 'Rascal', 'MewMew', 'Jingles'],
  5. 'species': []})
  6. for i,r in df.iterrows():
  7. if any([x in r['description'] for x in kwrds]):
  8. df.at[i, 'species'] = 'Canine'
  9. else:
  10. df.at[i, 'species'] = 'Feline'

The looping itself seems to work fine, however I am running into an issue where sometimes the species column will end up with multiple entries like

  1. CanineCanineCanineCanine

Where other times it will work fine.

From what I understand the list comprehension itself should only return a true or false value. It almost seems like the row is getting iterated over multiple times, but with the same index, so the entry is created over and over.

The problem I'm thinking with that thought though is that it is not happening for every row in the dataframe. Only some, and generally always towards the end of the dataframe.

I'm not even sure where to start on trying to diagnose this issue.

答案1

得分: 1

  1. df['Species'] = np.where(df.description.str.contains("|".join(kwrds)), '犬类', '猫类')
  2. df
  3. description name Species
  4. 0 This is a puppy Rufus 犬类
  5. 1 This is a dog Dingo 犬类
  6. 2 This is a golden retriever type dog Rascal 犬类
  7. 3 This is a cat MewMew 猫类
  8. 4 this is a kitten Jingles 猫类
英文:
  1. df['Species'] = np.where(df.description.str.contains("|".join(kwrds)), 'Canine', 'Feline')
  2. df
  3. description name Species
  4. 0 This is a puppy Rufus Canine
  5. 1 This is a dog Dingo Canine
  6. 2 This is a golden retriever type dog Rascal Canine
  7. 3 This is a cat MewMew Feline
  8. 4 this is a kitten Jingles Feline

huangapple
  • 本文由 发表于 2023年6月1日 02:01:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76376203.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定