2023年6月26日 17:40:51go评论121阅读模式

英文:

How do I remove rows in a Pandas dataframe that have the same values in different columns?

问题

我有一个看起来像这样的数据框：

项目	笔记本	圆珠笔	铅笔	橡皮擦	铅笔刀	订书机	纸张	剪刀
图像1	1	0	1	1	0	0	0	0
图像2	0	1	0	0	0	0	1	0
图像3	0	0	0	0	1	0	0	0
图像4	0	0	0	0	0	1	0	0
图像5	0	0	0	0	0	0	0	1

我想要删除那些在不同列中具有多个1的行，使其变成这样：

项目	铅笔刀	订书机	剪刀
图像3	1	0	0
图像4	0	1	0
图像5	0	0	1

英文:

I have a dataframe that looks like this:

Items	notebook	ballpoint	pencil	eraser	pencil sharpener	stapler	paper	scissors
image1	1	0	1	1	0	0	0	0
image2	0	1	0	0	0	0	1	0
image3	0	0	0	0	1	0	0	0
image4	0	0	0	0	0	1	0	0
image5	0	0	0	0	0	0	0	1

I want to delete rows that have multiple 1 in different columns, so it become like this:

Items	pencil sharpener	stapler	scissors
image3	1	0	0
image4	0	1	0
image5	0	0	1

答案1

得分: 1

使用numpy掩码：

df[np.sum(df.values[:,1:]) < 2]

应该比基于pandas的计算更快。

英文:

Using a numpy mask:

df[np.sum(df.values[:,1:]) &lt; 2]

should be faster than a pandas based computation.

答案2

得分: 0

你可以使用布尔索引，并以匹配项或值的sum（如果仅为0/1）作为参考：

out = df[df.drop(columns='Items').sum(axis=1).lt(2)]

或者：

out = df[df.eq(1).sum(axis=1).lt(2)]

输出：

    Items  notebook  ballpoint  pencil  eraser  pencil.1  sharpener  stapler  paper  scissors  glue
2  image3         0          0       0       0         1          0        0      0         0   NaN
3  image4         0          0       0       0         0          1        0      0         0   NaN
4  image5         0          0       0       0         0          0        0      1         0   NaN

中间索引系列：

df.drop(columns='Items').sum(axis=1).lt(2)
# 或者
# df.eq(1).sum(axis=1).lt(2)
0    False
1    False
2     True
3     True
4     True
dtype: bool

英文:

You can use boolean indexing with the sum of matches or values (if only 0/1) as reference:

out = df[df.drop(columns=&#39;Items&#39;).sum(axis=1).lt(2)]

Or:

out = df[df.eq(1).sum(axis=1).lt(2)]

Output:

    Items  notebook  ballpoint  pencil  eraser  pencil.1  sharpener  stapler  paper  scissors  glue
2  image3         0          0       0       0         1          0        0      0         0   NaN
3  image4         0          0       0       0         0          1        0      0         0   NaN
4  image5         0          0       0       0         0          0        0      1         0   NaN

Intermediate indexing Series:

df.drop(columns=&#39;Items&#39;).sum(axis=1).lt(2)
# or
# df.eq(1).sum(axis=1).lt(2)
0    False
1    False
2     True
3     True
4     True
dtype: bool

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

我如何从Pandas数据框中删除具有不同列中相同值的行？

问题

答案1

答案2

网页抓取结果不正确

获取必要的对象变量以重新创建具有init的对象

如何在CMake安装中创建一个Python 3虚拟环境？

main.py 和 init.py 文件的无效行为

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。