使用any()创建多个条目的列表理解在Pandas中。

huangapple go评论47阅读模式
英文:

List Comprehension Using any() Creating Multiple Entries in Pandas

问题

I have a scenario where I have created a list of keywords, and I'm iterative over the rows of a dataframe to determine a column value if another column contains any words from my keyword list in it. Here is an example:

kwrds = ['dog', 'puppy', 'golden retriever']

df = pd.DataFrame({
'description': ['This is a puppy', 'This is a dog', 'This is a golden retriever type dog', 'This is a cat', 'this is a kitten'],
'name': ['Rufus', 'Dingo', 'Rascal', 'MewMew', 'Jingles'],
'species': []})

for i,r in df.iterrows():
if any([x in r['description'] for x in kwrds]):
df.at[i, 'species'] = 'Canine'
else:
df.at[i, 'species'] = 'Feline'

The looping itself seems to work fine, however I am running into an issue where sometimes the species column will end up with multiple entries like

CanineCanineCanineCanine

Where other times it will work fine.

From what I understand the list comprehension itself should only return a true or false value. It almost seems like the row is getting iterated over multiple times, but with the same index, so the entry is created over and over.

The problem I'm thinking with that thought though is that it is not happening for every row in the dataframe. Only some, and generally always towards the end of the dataframe.

I'm not even sure where to start on trying to diagnose this issue.

英文:

I have a scenario where I have created a list of keywords, and I'm iterative over the rows of a dataframe to determine a column value if another column contains any words from my keyword list in it. Here is an example:

kwrds = ['dog', 'puppy', 'golden retriever']

df = pd.DataFrame({
'description': ['This is a puppy', 'This is a dog', 'This is a golden retriever type dog', 'This is a cat', 'this is a kitten'],
'name': ['Rufus', 'Dingo', 'Rascal', 'MewMew', 'Jingles'],
'species': []})

for i,r in df.iterrows():
    if any([x in r['description'] for x in kwrds]):
          df.at[i, 'species'] = 'Canine'
    else:
          df.at[i, 'species'] = 'Feline'

The looping itself seems to work fine, however I am running into an issue where sometimes the species column will end up with multiple entries like

CanineCanineCanineCanine

Where other times it will work fine.

From what I understand the list comprehension itself should only return a true or false value. It almost seems like the row is getting iterated over multiple times, but with the same index, so the entry is created over and over.

The problem I'm thinking with that thought though is that it is not happening for every row in the dataframe. Only some, and generally always towards the end of the dataframe.

I'm not even sure where to start on trying to diagnose this issue.

答案1

得分: 1

df['Species'] = np.where(df.description.str.contains("|".join(kwrds)), '犬类', '猫类')

df
                           description     name Species
0                      This is a puppy    Rufus     犬类
1                        This is a dog    Dingo     犬类
2  This is a golden retriever type dog   Rascal     犬类
3                        This is a cat   MewMew     猫类
4                     this is a kitten  Jingles     猫类
英文:
df['Species'] = np.where(df.description.str.contains("|".join(kwrds)), 'Canine', 'Feline')

df
                           description     name Species
0                      This is a puppy    Rufus  Canine
1                        This is a dog    Dingo  Canine
2  This is a golden retriever type dog   Rascal  Canine
3                        This is a cat   MewMew  Feline
4                     this is a kitten  Jingles  Feline

huangapple
  • 本文由 发表于 2023年6月1日 02:01:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76376203.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定