英文:
How to select specific values in a pandas data frame?
问题
我正在处理一个 pandas 数据帧,其中某些行只有一个数字,而其他行有多个数字。我需要创建一个标签列,该列由从具有一个特定数字的行复制的值组成,那些有多个值的行应分配为零。
这是一个示例:
我尝试了以下代码,但在这里不起作用。
df["label"] = df[["Column1", "Column2", "Column3", "Column4", "Column5"]].max(axis=1)
有人可以建议一种解决方法吗?
可重现的输入:
df = pd.DataFrame({'A': [1, 0, 0, 0, 0, 1, 0, 0, 0],
'B': [2, 2, 0, 0, 2, 0, 0, 0, 4],
'C': [0, 0, 3, 3, 0, 0, 0, 3, 2],
'D': [0, 0, 0, 4, 4, 0, 4, 0, 0],
'E': [0, 0, 0, 0, 0, 0, 0, 0, 5]})
英文:
I am working on a pandas data frame, which some of the row has one number and others has more than a number. I need to create a label column, which consists of values copied from the rows with one specific number and those rows with more than one value should be assigned to zero.
This is an example:
I tried the following code, but it does not work here.
df["label"] = df[["Column1", "Column2", "Column3", " Column4", "Column5"]].max(axis=1)
Can anyone suggest a way to solve this?
Reproducible input:
df = pd.DataFrame({'A': [1, 0, 0, 0, 0, 1, 0, 0, 0],
'B': [2, 2, 0, 0, 2, 0, 0, 0, 4],
'C': [0, 0, 3, 3, 0, 0, 0, 3, 2],
'D': [0, 0, 0, 4, 4, 0, 4, 0, 0],
'E': [0, 0, 0, 0, 0, 0, 0, 0, 5]})
答案1
得分: 1
你可以使用 groupby
操作:
g = df.filter(like='Column').replace(0, float('nan')).stack().groupby(level=0)
df['Label'] = g.first().where(g.size().eq(1), 0)
或者使用掩码和 bfill
:
tmp = df.filter(like='Column')
m = tmp.ne(0)
df['Label'] = tmp.where(m).bfill(axis=1).iloc[:, 0].where(m.sum(axis=1).eq(1), 0)
或者按照您原始的方法使用 max
:
tmp = df.filter(like='Column')
df['Label'] = tmp.max(axis=1).where(tmp.ne(0).sum(axis=1).eq(1), 0)
输出:
Column1 Column2 Column3 Column4 Column5 Label
0 1 2 0 0 0 0
1 0 2 0 0 0 2
2 0 0 3 0 0 3
3 0 0 3 4 0 0
4 0 2 0 4 0 0
5 1 0 0 0 0 1
6 0 0 0 4 0 4
7 0 0 3 0 0 3
8 0 4 2 0 5 0
英文:
You can use groupby
operations:
g = df.filter(like='Column').replace(0, float('nan')).stack().groupby(level=0)
df['Label'] = g.first().where(g.size().eq(1), 0)
Or a mask and bfill
:
tmp = df.filter(like='Column')
m = tmp.ne(0)
df['Label'] = tmp.where(m).bfill(axis=1).iloc[:, 0].where(m.sum(axis=1).eq(1), 0)
Or following your original approach with max
:
tmp = df.filter(like='Column')
df['Label'] = tmp.max(axis=1).where(tmp.ne(0).sum(axis=1).eq(1), 0)
Output:
Column1 Column2 Column3 Column4 Column5 Label
0 1 2 0 0 0 0
1 0 2 0 0 0 2
2 0 0 3 0 0 3
3 0 0 3 4 0 0
4 0 2 0 4 0 0
5 1 0 0 0 0 1
6 0 0 0 4 0 4
7 0 0 3 0 0 3
8 0 4 2 0 5 0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论