我如何从Pandas数据框中删除具有不同列中相同值的行?

huangapple go评论77阅读模式
英文:

How do I remove rows in a Pandas dataframe that have the same values in different columns?

问题

我有一个看起来像这样的数据框:

项目 笔记本 圆珠笔 铅笔 橡皮擦 铅笔刀 订书机 纸张 剪刀 胶水
图像1 1 0 1 1 0 0 0 0 0
图像2 0 1 0 0 0 0 1 0 0
图像3 0 0 0 0 1 0 0 0 0
图像4 0 0 0 0 0 1 0 0 0
图像5 0 0 0 0 0 0 0 1 0

我想要删除那些在不同列中具有多个1的行,使其变成这样:

项目 笔记本 圆珠笔 铅笔 橡皮擦 铅笔刀 订书机 纸张 剪刀 胶水
图像3 0 0 0 0 1 0 0 0 0
图像4 0 0 0 0 0 1 0 0 0
图像5 0 0 0 0 0 0 0 1 0
英文:

I have a dataframe that looks like this:

Items notebook ballpoint pencil eraser pencil sharpener stapler paper scissors glue
image1 1 0 1 1 0 0 0 0 0
image2 0 1 0 0 0 0 1 0 0
image3 0 0 0 0 1 0 0 0 0
image4 0 0 0 0 0 1 0 0 0
image5 0 0 0 0 0 0 0 1 0

I want to delete rows that have multiple 1 in different columns, so it become like this:

Items notebook ballpoint pencil eraser pencil sharpener stapler paper scissors glue
image3 0 0 0 0 1 0 0 0 0
image4 0 0 0 0 0 1 0 0 0
image5 0 0 0 0 0 0 0 1 0

答案1

得分: 1

使用numpy掩码:

df[np.sum(df.values[:,1:]) < 2]

应该比基于pandas的计算更快。

英文:

Using a numpy mask:

df[np.sum(df.values[:,1:]) &lt; 2]

should be faster than a pandas based computation.

答案2

得分: 0

你可以使用布尔索引,并以匹配项或值的sum(如果仅为0/1)作为参考:

out = df[df.drop(columns='Items').sum(axis=1).lt(2)]

或者:

out = df[df.eq(1).sum(axis=1).lt(2)]

输出:

    Items  notebook  ballpoint  pencil  eraser  pencil.1  sharpener  stapler  paper  scissors  glue
2  image3         0          0       0       0         1          0        0      0         0   NaN
3  image4         0          0       0       0         0          1        0      0         0   NaN
4  image5         0          0       0       0         0          0        0      1         0   NaN

中间索引系列:

df.drop(columns='Items').sum(axis=1).lt(2)
# 或者
# df.eq(1).sum(axis=1).lt(2)

0    False
1    False
2     True
3     True
4     True
dtype: bool
英文:

You can use boolean indexing with the sum of matches or values (if only 0/1) as reference:

out = df[df.drop(columns=&#39;Items&#39;).sum(axis=1).lt(2)]

Or:

out = df[df.eq(1).sum(axis=1).lt(2)]

Output:

    Items  notebook  ballpoint  pencil  eraser  pencil.1  sharpener  stapler  paper  scissors  glue
2  image3         0          0       0       0         1          0        0      0         0   NaN
3  image4         0          0       0       0         0          1        0      0         0   NaN
4  image5         0          0       0       0         0          0        0      1         0   NaN

Intermediate indexing Series:

df.drop(columns=&#39;Items&#39;).sum(axis=1).lt(2)
# or
# df.eq(1).sum(axis=1).lt(2)

0    False
1    False
2     True
3     True
4     True
dtype: bool

huangapple
  • 本文由 发表于 2023年6月26日 17:40:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76555454.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定