如何在pandas数据框中选择特定的值?

huangapple go评论67阅读模式
英文:

How to select specific values in a pandas data frame?

问题

我正在处理一个 pandas 数据帧,其中某些行只有一个数字,而其他行有多个数字。我需要创建一个标签列,该列由从具有一个特定数字的行复制的值组成,那些有多个值的行应分配为零。

这是一个示例:

如何在pandas数据框中选择特定的值?

我尝试了以下代码,但在这里不起作用。

df["label"] = df[["Column1", "Column2", "Column3", "Column4", "Column5"]].max(axis=1)

有人可以建议一种解决方法吗?

可重现的输入:

df = pd.DataFrame({'A': [1, 0, 0, 0, 0, 1, 0, 0, 0],
                   'B': [2, 2, 0, 0, 2, 0, 0, 0, 4],
                   'C': [0, 0, 3, 3, 0, 0, 0, 3, 2],
                   'D': [0, 0, 0, 4, 4, 0, 4, 0, 0],
                   'E': [0, 0, 0, 0, 0, 0, 0, 0, 5]})
英文:

I am working on a pandas data frame, which some of the row has one number and others has more than a number. I need to create a label column, which consists of values copied from the rows with one specific number and those rows with more than one value should be assigned to zero.
This is an example:

如何在pandas数据框中选择特定的值?

I tried the following code, but it does not work here.

df["label"] = df[["Column1", "Column2", "Column3", " Column4", "Column5"]].max(axis=1)

Can anyone suggest a way to solve this?

Reproducible input:

df = pd.DataFrame({'A': [1, 0, 0, 0, 0, 1, 0, 0, 0],
                   'B': [2, 2, 0, 0, 2, 0, 0, 0, 4],
                   'C': [0, 0, 3, 3, 0, 0, 0, 3, 2],
                   'D': [0, 0, 0, 4, 4, 0, 4, 0, 0],
                   'E': [0, 0, 0, 0, 0, 0, 0, 0, 5]})

答案1

得分: 1

你可以使用 groupby 操作:

g = df.filter(like='Column').replace(0, float('nan')).stack().groupby(level=0)

df['Label'] = g.first().where(g.size().eq(1), 0)

或者使用掩码和 bfill

tmp = df.filter(like='Column')
m = tmp.ne(0)
df['Label'] = tmp.where(m).bfill(axis=1).iloc[:, 0].where(m.sum(axis=1).eq(1), 0)

或者按照您原始的方法使用 max

tmp = df.filter(like='Column')
df['Label'] = tmp.max(axis=1).where(tmp.ne(0).sum(axis=1).eq(1), 0)

输出:

   Column1  Column2  Column3  Column4  Column5  Label
0        1        2        0        0        0      0
1        0        2        0        0        0      2
2        0        0        3        0        0      3
3        0        0        3        4        0      0
4        0        2        0        4        0      0
5        1        0        0        0        0      1
6        0        0        0        4        0      4
7        0        0        3        0        0      3
8        0        4        2        0        5      0
英文:

You can use groupby operations:

g = df.filter(like='Column').replace(0, float('nan')).stack().groupby(level=0)

df['Label'] = g.first().where(g.size().eq(1), 0)

Or a mask and bfill:

tmp = df.filter(like='Column')
m = tmp.ne(0)
df['Label'] = tmp.where(m).bfill(axis=1).iloc[:, 0].where(m.sum(axis=1).eq(1), 0)

Or following your original approach with max:

tmp = df.filter(like='Column')
df['Label'] = tmp.max(axis=1).where(tmp.ne(0).sum(axis=1).eq(1), 0)

Output:

   Column1  Column2  Column3  Column4  Column5  Label
0        1        2        0        0        0      0
1        0        2        0        0        0      2
2        0        0        3        0        0      3
3        0        0        3        4        0      0
4        0        2        0        4        0      0
5        1        0        0        0        0      1
6        0        0        0        4        0      4
7        0        0        3        0        0      3
8        0        4        2        0        5      0

huangapple
  • 本文由 发表于 2023年6月27日 20:39:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76564964.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定