英文:
Python Pandas - How to Keep All Observations By Selector Variable When Data are in Long Form
问题
我正在使用Python 3.11.1工作。我有一些数据存储在一个Pandas Dataframe中:
ID Position Select
1 A 0
2 B 1
2 C 0
3 B 0
3 C 0
4 A 1
5 A 0
其中一些ID在多行中记录,但其他ID只出现在单行中。我需要通过保留所有Select为1的单行ID以及保留所有多行ID中任何一行Select为1的行来对数据集进行子集化。结果数据集应如下所示:
ID Position Select
2 B 1
2 C 0
4 A 1
最好的方法是什么?
最后,我需要将数据从长格式转换为宽格式。因此,最终结果应为:
ID Position1 Position2 Select
2 B C 1
4 A 1
提前感谢。
英文:
I am working in Python 3.11.1. I have data like these stored in a Pandas Dataframe:
ID Position Select
1 A 0
2 B 1
2 C 0
3 B 0
3 C 0
4 A 1
5 A 0
Where some ID's are recorded in multiple rows, but others only appear in a single row. I need to subset this dataset by keeping all single ID rows coded 1 for Select AND keeping ALL multiple ID rows if ANY one of those multiple rows is coded 1 for Select for the same ID. The resulting dataset should look like:
ID Position Select
2 B 1
2 C 0
4 A 1
What is the best way to do this?
Ultimately, I then need to covert from long to wide form. Therefore, the final result should be:
ID Position1 Position2 Select
2 B C 1
4 A 1
Thanks in advance.
答案1
得分: 2
尝试:
u = df.groupby("ID").filter(lambda x: (x.Select == 1).any()) # 或者只使用 `lambda x: x.Select.any()` 如果只有0/1值
print(u)
输出:
ID Position Select
1 2 B 1
2 2 C 0
5 4 A 1
转换为长格式:
u["col"] = u.groupby("ID").cumcount() + 1
s = u.groupby("ID")["Select"].any().astype(int)
u = u[["ID", "col", "Position"]].pivot(index="ID", columns="col")
u.columns = [f"{c[0]}{c[1]}" for c in u.columns]
print(pd.concat([u, s], axis=1).fillna("").reset_index())
输出:
ID Position1 Position2 Select
0 2 B C 1
1 4 A 1
英文:
Try:
u = df.groupby("ID").filter(lambda x: (x.Select == 1).any()) # or just `lambda x: x.Select.any()` if there are only 0/1 values
print(u)
Prints:
ID Position Select
1 2 B 1
2 2 C 0
5 4 A 1
To long form:
u["col"] = u.groupby("ID").cumcount() + 1
s = u.groupby("ID")["Select"].any().astype(int)
u = u[["ID", "col", "Position"]].pivot(index="ID", columns="col")
u.columns = [f"{c[0]}{c[1]}" for c in u.columns]
print(pd.concat([u, s], axis=1).fillna("").reset_index())
Prints:
ID Position1 Position2 Select
0 2 B C 1
1 4 A 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论