Python Pandas – 如何在数据为长格式时,通过选择变量保留所有观测值

huangapple go评论105阅读模式
英文:

Python Pandas - How to Keep All Observations By Selector Variable When Data are in Long Form

问题

我正在使用Python 3.11.1工作。我有一些数据存储在一个Pandas Dataframe中:

  1. ID Position Select
  2. 1 A 0
  3. 2 B 1
  4. 2 C 0
  5. 3 B 0
  6. 3 C 0
  7. 4 A 1
  8. 5 A 0

其中一些ID在多行中记录,但其他ID只出现在单行中。我需要通过保留所有Select为1的单行ID以及保留所有多行ID中任何一行Select为1的行来对数据集进行子集化。结果数据集应如下所示:

  1. ID Position Select
  2. 2 B 1
  3. 2 C 0
  4. 4 A 1

最好的方法是什么?

最后,我需要将数据从长格式转换为宽格式。因此,最终结果应为:

  1. ID Position1 Position2 Select
  2. 2 B C 1
  3. 4 A 1

提前感谢。

英文:

I am working in Python 3.11.1. I have data like these stored in a Pandas Dataframe:

  1. ID Position Select
  2. 1 A 0
  3. 2 B 1
  4. 2 C 0
  5. 3 B 0
  6. 3 C 0
  7. 4 A 1
  8. 5 A 0

Where some ID's are recorded in multiple rows, but others only appear in a single row. I need to subset this dataset by keeping all single ID rows coded 1 for Select AND keeping ALL multiple ID rows if ANY one of those multiple rows is coded 1 for Select for the same ID. The resulting dataset should look like:

  1. ID Position Select
  2. 2 B 1
  3. 2 C 0
  4. 4 A 1

What is the best way to do this?

Ultimately, I then need to covert from long to wide form. Therefore, the final result should be:

  1. ID Position1 Position2 Select
  2. 2 B C 1
  3. 4 A 1

Thanks in advance.

答案1

得分: 2

尝试:

  1. u = df.groupby("ID").filter(lambda x: (x.Select == 1).any()) # 或者只使用 `lambda x: x.Select.any()` 如果只有0/1值
  2. print(u)

输出:

  1. ID Position Select
  2. 1 2 B 1
  3. 2 2 C 0
  4. 5 4 A 1

转换为长格式:

  1. u["col"] = u.groupby("ID").cumcount() + 1
  2. s = u.groupby("ID")["Select"].any().astype(int)
  3. u = u[["ID", "col", "Position"]].pivot(index="ID", columns="col")
  4. u.columns = [f"{c[0]}{c[1]}" for c in u.columns]
  5. print(pd.concat([u, s], axis=1).fillna("").reset_index())

输出:

  1. ID Position1 Position2 Select
  2. 0 2 B C 1
  3. 1 4 A 1
英文:

Try:

  1. u = df.groupby("ID").filter(lambda x: (x.Select == 1).any()) # or just `lambda x: x.Select.any()` if there are only 0/1 values
  2. print(u)

Prints:

  1. ID Position Select
  2. 1 2 B 1
  3. 2 2 C 0
  4. 5 4 A 1

To long form:

  1. u["col"] = u.groupby("ID").cumcount() + 1
  2. s = u.groupby("ID")["Select"].any().astype(int)
  3. u = u[["ID", "col", "Position"]].pivot(index="ID", columns="col")
  4. u.columns = [f"{c[0]}{c[1]}" for c in u.columns]
  5. print(pd.concat([u, s], axis=1).fillna("").reset_index())

Prints:

  1. ID Position1 Position2 Select
  2. 0 2 B C 1
  3. 1 4 A 1

huangapple
  • 本文由 发表于 2023年8月9日 02:07:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76862163.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定