2023年6月1日 02:01:54go评论89阅读模式

英文:

List Comprehension Using any() Creating Multiple Entries in Pandas

问题

I have a scenario where I have created a list of keywords, and I'm iterative over the rows of a dataframe to determine a column value if another column contains any words from my keyword list in it. Here is an example:

kwrds = ['dog', 'puppy', 'golden retriever']

df = pd.DataFrame({
'description': ['This is a puppy', 'This is a dog', 'This is a golden retriever type dog', 'This is a cat', 'this is a kitten'],
'name': ['Rufus', 'Dingo', 'Rascal', 'MewMew', 'Jingles'],
'species': []})

for i,r in df.iterrows():
if any([x in r['description'] for x in kwrds]):
df.at[i, 'species'] = 'Canine'
else:
df.at[i, 'species'] = 'Feline'

The looping itself seems to work fine, however I am running into an issue where sometimes the species column will end up with multiple entries like

CanineCanineCanineCanine

Where other times it will work fine.

From what I understand the list comprehension itself should only return a true or false value. It almost seems like the row is getting iterated over multiple times, but with the same index, so the entry is created over and over.

The problem I'm thinking with that thought though is that it is not happening for every row in the dataframe. Only some, and generally always towards the end of the dataframe.

I'm not even sure where to start on trying to diagnose this issue.

英文:

kwrds = [&#39;dog&#39;, &#39;puppy&#39;, &#39;golden retriever&#39;]
df = pd.DataFrame({
&#39;description&#39;: [&#39;This is a puppy&#39;, &#39;This is a dog&#39;, &#39;This is a golden retriever type dog&#39;, &#39;This is a cat&#39;, &#39;this is a kitten&#39;],
&#39;name&#39;: [&#39;Rufus&#39;, &#39;Dingo&#39;, &#39;Rascal&#39;, &#39;MewMew&#39;, &#39;Jingles&#39;],
&#39;species&#39;: []})
for i,r in df.iterrows():
    if any([x in r[&#39;description&#39;] for x in kwrds]):
          df.at[i, &#39;species&#39;] = &#39;Canine&#39;
    else:
          df.at[i, &#39;species&#39;] = &#39;Feline&#39;

The looping itself seems to work fine, however I am running into an issue where sometimes the species column will end up with multiple entries like

CanineCanineCanineCanine

Where other times it will work fine.

The problem I'm thinking with that thought though is that it is not happening for every row in the dataframe. Only some, and generally always towards the end of the dataframe.

I'm not even sure where to start on trying to diagnose this issue.

答案1

得分: 1

df['Species'] = np.where(df.description.str.contains("|".join(kwrds)), '犬类', '猫类')
df
                           description     name Species
0                      This is a puppy    Rufus     犬类
1                        This is a dog    Dingo     犬类
2  This is a golden retriever type dog   Rascal     犬类
3                        This is a cat   MewMew     猫类
4                     this is a kitten  Jingles     猫类

英文:

df[&#39;Species&#39;] = np.where(df.description.str.contains(&quot;|&quot;.join(kwrds)), &#39;Canine&#39;, &#39;Feline&#39;)
df
                           description     name Species
0                      This is a puppy    Rufus  Canine
1                        This is a dog    Dingo  Canine
2  This is a golden retriever type dog   Rascal  Canine
3                        This is a cat   MewMew  Feline
4                     this is a kitten  Jingles  Feline

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用any()创建多个条目的列表理解在Pandas中。

问题

答案1

Libadwaita主题在Python中不起作用。

从头创建要素图层

创建分类之间的层次结构。

如何防止Jupyter Notebook绘制函数返回的图形。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。