Python Pandas – 如何在数据为长格式时,通过选择变量保留所有观测值

huangapple go评论69阅读模式
英文:

Python Pandas - How to Keep All Observations By Selector Variable When Data are in Long Form

问题

我正在使用Python 3.11.1工作。我有一些数据存储在一个Pandas Dataframe中:

ID	Position	Select
1	   A	      0
2	   B	      1
2	   C	      0
3	   B	      0
3	   C	      0
4	   A	      1
5	   A	      0

其中一些ID在多行中记录,但其他ID只出现在单行中。我需要通过保留所有Select为1的单行ID以及保留所有多行ID中任何一行Select为1的行来对数据集进行子集化。结果数据集应如下所示:

ID	Position	Select
2	   B	      1
2	   C	      0
4	   A	      1

最好的方法是什么?

最后,我需要将数据从长格式转换为宽格式。因此,最终结果应为:

ID	Position1	Position2  Select
2	   B	        C        1
4	   A	                 1

提前感谢。

英文:

I am working in Python 3.11.1. I have data like these stored in a Pandas Dataframe:

ID	Position	Select
1	   A	      0
2	   B	      1
2	   C	      0
3	   B	      0
3	   C	      0
4	   A	      1
5	   A	      0

Where some ID's are recorded in multiple rows, but others only appear in a single row. I need to subset this dataset by keeping all single ID rows coded 1 for Select AND keeping ALL multiple ID rows if ANY one of those multiple rows is coded 1 for Select for the same ID. The resulting dataset should look like:

ID	Position	Select
2	   B	      1
2	   C	      0
4	   A	      1

What is the best way to do this?

Ultimately, I then need to covert from long to wide form. Therefore, the final result should be:

ID	Position1	Position2  Select
2	   B	        C        1
4	   A	                 1

Thanks in advance.

答案1

得分: 2

尝试:

u = df.groupby("ID").filter(lambda x: (x.Select == 1).any())  # 或者只使用 `lambda x: x.Select.any()` 如果只有0/1值
print(u)

输出:

   ID Position  Select
1   2        B       1
2   2        C       0
5   4        A       1

转换为长格式:

u["col"] = u.groupby("ID").cumcount() + 1
s = u.groupby("ID")["Select"].any().astype(int)

u = u[["ID", "col", "Position"]].pivot(index="ID", columns="col")
u.columns = [f"{c[0]}{c[1]}" for c in u.columns]
print(pd.concat([u, s], axis=1).fillna("").reset_index())

输出:

   ID Position1 Position2  Select
0   2         B         C       1
1   4         A                 1
英文:

Try:

u = df.groupby("ID").filter(lambda x: (x.Select == 1).any())  # or just `lambda x: x.Select.any()` if there are only 0/1 values
print(u)

Prints:

   ID Position  Select
1   2        B       1
2   2        C       0
5   4        A       1

To long form:

u["col"] = u.groupby("ID").cumcount() + 1
s = u.groupby("ID")["Select"].any().astype(int)

u = u[["ID", "col", "Position"]].pivot(index="ID", columns="col")
u.columns = [f"{c[0]}{c[1]}" for c in u.columns]
print(pd.concat([u, s], axis=1).fillna("").reset_index())

Prints:

   ID Position1 Position2  Select
0   2         B         C       1
1   4         A                 1

huangapple
  • 本文由 发表于 2023年8月9日 02:07:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76862163.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定