英文:
How do I remove rows in a Pandas dataframe that have the same values in different columns?
问题
我有一个看起来像这样的数据框:
项目 | 笔记本 | 圆珠笔 | 铅笔 | 橡皮擦 | 铅笔刀 | 订书机 | 纸张 | 剪刀 | 胶水 |
---|---|---|---|---|---|---|---|---|---|
图像1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
图像2 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
图像3 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
图像4 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
图像5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
我想要删除那些在不同列中具有多个1
的行,使其变成这样:
项目 | 笔记本 | 圆珠笔 | 铅笔 | 橡皮擦 | 铅笔刀 | 订书机 | 纸张 | 剪刀 | 胶水 |
---|---|---|---|---|---|---|---|---|---|
图像3 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
图像4 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
图像5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
英文:
I have a dataframe that looks like this:
Items | notebook | ballpoint | pencil | eraser | pencil sharpener | stapler | paper | scissors | glue |
---|---|---|---|---|---|---|---|---|---|
image1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
image2 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
image3 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
image4 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
image5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
I want to delete rows that have multiple 1
in different columns, so it become like this:
Items | notebook | ballpoint | pencil | eraser | pencil sharpener | stapler | paper | scissors | glue |
---|---|---|---|---|---|---|---|---|---|
image3 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
image4 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
image5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
答案1
得分: 1
使用numpy掩码:
df[np.sum(df.values[:,1:]) < 2]
应该比基于pandas的计算更快。
英文:
Using a numpy mask:
df[np.sum(df.values[:,1:]) < 2]
should be faster than a pandas based computation.
答案2
得分: 0
你可以使用布尔索引,并以匹配项或值的sum
(如果仅为0/1)作为参考:
out = df[df.drop(columns='Items').sum(axis=1).lt(2)]
或者:
out = df[df.eq(1).sum(axis=1).lt(2)]
输出:
Items notebook ballpoint pencil eraser pencil.1 sharpener stapler paper scissors glue
2 image3 0 0 0 0 1 0 0 0 0 NaN
3 image4 0 0 0 0 0 1 0 0 0 NaN
4 image5 0 0 0 0 0 0 0 1 0 NaN
中间索引系列:
df.drop(columns='Items').sum(axis=1).lt(2)
# 或者
# df.eq(1).sum(axis=1).lt(2)
0 False
1 False
2 True
3 True
4 True
dtype: bool
英文:
You can use boolean indexing with the sum
of matches or values (if only 0/1) as reference:
out = df[df.drop(columns='Items').sum(axis=1).lt(2)]
Or:
out = df[df.eq(1).sum(axis=1).lt(2)]
Output:
Items notebook ballpoint pencil eraser pencil.1 sharpener stapler paper scissors glue
2 image3 0 0 0 0 1 0 0 0 0 NaN
3 image4 0 0 0 0 0 1 0 0 0 NaN
4 image5 0 0 0 0 0 0 0 1 0 NaN
Intermediate indexing Series:
df.drop(columns='Items').sum(axis=1).lt(2)
# or
# df.eq(1).sum(axis=1).lt(2)
0 False
1 False
2 True
3 True
4 True
dtype: bool
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论