2023年8月9日 02:07:31go评论105阅读模式

英文:

Python Pandas - How to Keep All Observations By Selector Variable When Data are in Long Form

问题

我正在使用Python 3.11.1工作。我有一些数据存储在一个Pandas Dataframe中：

ID	Position	Select
1	   A	      0
2	   B	      1
2	   C	      0
3	   B	      0
3	   C	      0
4	   A	      1
5	   A	      0

其中一些ID在多行中记录，但其他ID只出现在单行中。我需要通过保留所有Select为1的单行ID以及保留所有多行ID中任何一行Select为1的行来对数据集进行子集化。结果数据集应如下所示：

ID	Position	Select
2	   B	      1
2	   C	      0
4	   A	      1

最好的方法是什么？

最后，我需要将数据从长格式转换为宽格式。因此，最终结果应为：

ID	Position1	Position2  Select
2	   B	        C        1
4	   A	                 1

提前感谢。

英文:

I am working in Python 3.11.1. I have data like these stored in a Pandas Dataframe:

ID	Position	Select
1	   A	      0
2	   B	      1
2	   C	      0
3	   B	      0
3	   C	      0
4	   A	      1
5	   A	      0

Where some ID's are recorded in multiple rows, but others only appear in a single row. I need to subset this dataset by keeping all single ID rows coded 1 for Select AND keeping ALL multiple ID rows if ANY one of those multiple rows is coded 1 for Select for the same ID. The resulting dataset should look like:

ID	Position	Select
2	   B	      1
2	   C	      0
4	   A	      1

What is the best way to do this?

Ultimately, I then need to covert from long to wide form. Therefore, the final result should be:

ID	Position1	Position2  Select
2	   B	        C        1
4	   A	                 1

Thanks in advance.

答案1

得分: 2

尝试：

u = df.groupby("ID").filter(lambda x: (x.Select == 1).any())  # 或者只使用 `lambda x: x.Select.any()` 如果只有0/1值
print(u)

输出：

   ID Position  Select
1   2        B       1
2   2        C       0
5   4        A       1

转换为长格式：

u["col"] = u.groupby("ID").cumcount() + 1
s = u.groupby("ID")["Select"].any().astype(int)
u = u[["ID", "col", "Position"]].pivot(index="ID", columns="col")
u.columns = [f"{c[0]}{c[1]}" for c in u.columns]
print(pd.concat([u, s], axis=1).fillna("").reset_index())

输出：

   ID Position1 Position2  Select
0   2         B         C       1
1   4         A                 1

英文:

Try:

u = df.groupby(&quot;ID&quot;).filter(lambda x: (x.Select == 1).any())  # or just `lambda x: x.Select.any()` if there are only 0/1 values
print(u)

Prints:

   ID Position  Select
1   2        B       1
2   2        C       0
5   4        A       1

To long form:

u[&quot;col&quot;] = u.groupby(&quot;ID&quot;).cumcount() + 1
s = u.groupby(&quot;ID&quot;)[&quot;Select&quot;].any().astype(int)
u = u[[&quot;ID&quot;, &quot;col&quot;, &quot;Position&quot;]].pivot(index=&quot;ID&quot;, columns=&quot;col&quot;)
u.columns = [f&quot;{c[0]}{c[1]}&quot; for c in u.columns]
print(pd.concat([u, s], axis=1).fillna(&quot;&quot;).reset_index())

Prints:

   ID Position1 Position2  Select
0   2         B         C       1
1   4         A                 1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python Pandas – 如何在数据为长格式时，通过选择变量保留所有观测值

问题

答案1

改变表格中字段的颜色，取决于用户支付剩余的时间，使用 Django。

Cannot use tweepy on the free version of Twitter API?

无法将aiohttp请求转换为文本，同时使用rule34 API。

如何在Python中自动计算函数输出字符的数量？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。