2023年6月27日 20:39:19go评论99阅读模式

英文:

How to select specific values in a pandas data frame?

问题

我正在处理一个 pandas 数据帧，其中某些行只有一个数字，而其他行有多个数字。我需要创建一个标签列，该列由从具有一个特定数字的行复制的值组成，那些有多个值的行应分配为零。

这是一个示例：

我尝试了以下代码，但在这里不起作用。

df["label"] = df[["Column1", "Column2", "Column3", "Column4", "Column5"]].max(axis=1)

有人可以建议一种解决方法吗？

可重现的输入：

df = pd.DataFrame({'A': [1, 0, 0, 0, 0, 1, 0, 0, 0],
                   'B': [2, 2, 0, 0, 2, 0, 0, 0, 4],
                   'C': [0, 0, 3, 3, 0, 0, 0, 3, 2],
                   'D': [0, 0, 0, 4, 4, 0, 4, 0, 0],
                   'E': [0, 0, 0, 0, 0, 0, 0, 0, 5]})

英文:

I am working on a pandas data frame, which some of the row has one number and others has more than a number. I need to create a label column, which consists of values copied from the rows with one specific number and those rows with more than one value should be assigned to zero.
This is an example:

I tried the following code, but it does not work here.

df[&quot;label&quot;] = df[[&quot;Column1&quot;, &quot;Column2&quot;, &quot;Column3&quot;, &quot; Column4&quot;, &quot;Column5&quot;]].max(axis=1)

Can anyone suggest a way to solve this?

Reproducible input:

df = pd.DataFrame({&#39;A&#39;: [1, 0, 0, 0, 0, 1, 0, 0, 0],
                   &#39;B&#39;: [2, 2, 0, 0, 2, 0, 0, 0, 4],
                   &#39;C&#39;: [0, 0, 3, 3, 0, 0, 0, 3, 2],
                   &#39;D&#39;: [0, 0, 0, 4, 4, 0, 4, 0, 0],
                   &#39;E&#39;: [0, 0, 0, 0, 0, 0, 0, 0, 5]})

答案1

得分: 1

你可以使用 groupby 操作：

g = df.filter(like='Column').replace(0, float('nan')).stack().groupby(level=0)
df['Label'] = g.first().where(g.size().eq(1), 0)

或者使用掩码和 bfill：

tmp = df.filter(like='Column')
m = tmp.ne(0)
df['Label'] = tmp.where(m).bfill(axis=1).iloc[:, 0].where(m.sum(axis=1).eq(1), 0)

或者按照您原始的方法使用 max：

tmp = df.filter(like='Column')
df['Label'] = tmp.max(axis=1).where(tmp.ne(0).sum(axis=1).eq(1), 0)

输出：

   Column1  Column2  Column3  Column4  Column5  Label
0        1        2        0        0        0      0
1        0        2        0        0        0      2
2        0        0        3        0        0      3
3        0        0        3        4        0      0
4        0        2        0        4        0      0
5        1        0        0        0        0      1
6        0        0        0        4        0      4
7        0        0        3        0        0      3
8        0        4        2        0        5      0

英文:

You can use groupby operations:

g = df.filter(like=&#39;Column&#39;).replace(0, float(&#39;nan&#39;)).stack().groupby(level=0)
df[&#39;Label&#39;] = g.first().where(g.size().eq(1), 0)

Or a mask and bfill:

tmp = df.filter(like=&#39;Column&#39;)
m = tmp.ne(0)
df[&#39;Label&#39;] = tmp.where(m).bfill(axis=1).iloc[:, 0].where(m.sum(axis=1).eq(1), 0)

Or following your original approach with max:

tmp = df.filter(like=&#39;Column&#39;)
df[&#39;Label&#39;] = tmp.max(axis=1).where(tmp.ne(0).sum(axis=1).eq(1), 0)

Output:

   Column1  Column2  Column3  Column4  Column5  Label
0        1        2        0        0        0      0
1        0        2        0        0        0      2
2        0        0        3        0        0      3
3        0        0        3        4        0      0
4        0        2        0        4        0      0
5        1        0        0        0        0      1
6        0        0        0        4        0      4
7        0        0        3        0        0      3
8        0        4        2        0        5      0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在pandas数据框中选择特定的值？

问题

答案1

Find all the texts which is 'Normal' style and font size is NOT 11 in a docx file using python-docx

what does the keyword "\n" do in python? I don't know what it means

你可以在一个分组中的条形之间添加空白间隔吗？

如何在Python中使用LookerSDK 4.0下载已应用过滤器的仪表板上的瓷砖？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。