Python: Table where identical ID/Numbers with different values to being them on one line where the different values are appended to the right

huangapple go评论94阅读模式
英文:

Python: Table where identical ID/Numbers with different values to being them on one line where the different values are appended to the right

问题

我有一个带有一些在多行上相同的ID的Pandas表格,但分配的值不同。如何将ID仅显示一次在一行上,并将各种值附加在多个列中?

起始点:

ID Column 1
1 blue
1 red
2 gray
3 yellow
4 orange
1 pink
2 white

期望的解决方案:

ID Column 1 Column 2 Column 3
1 blue red pink
2 gray white
3 yellow
4 orange
英文:

I have a Pandas Table with some IDs that are identical on several lines but the assigned value is different. How is it possible to get a result where the ID is only shown once on one line and append the various values in multiple columns?

Starting point:

ID Column 1
1 blue
1 red
2 gray
3 yellow
4 orange
1 pink
2 white

Desired solution:

ID Column 1 Column 2 Column 3
1 blue red pink
2 gray white
3 yellow
4 orange

答案1

得分: 0

按照ID分组,然后计算唯一的数值

  1. df.groupby("ID")["Column 1"].apply(lambda x: pd.Series(x.unique())).unstack()
英文:

Groupby the ID and then compute the unique values

  1. df.groupby("ID")["Column 1"].apply(lambda x: pd.Series(x.unique())).unstack()

答案2

得分: 0

你可以使用向量化的方式重塑你的数据框架:

  1. (df.assign(col=df.groupby('ID').cumcount().add(1))
  2. .set_index(['ID', 'col'])['Column 1']
  3. .unstack('col').add_prefix('Column ')
  4. .reset_index().rename_axis(columns=None))
  5. ID Column 1 Column 2 Column 3
  6. 0 1 blue red pink
  7. 1 2 gray white NaN
  8. 2 3 yellow NaN NaN
  9. 3 4 orange NaN NaN

使用 pivot_table

  1. (df.pivot_table(index='ID', values='Column 1', aggfunc='first', fill_value='',
  2. columns='Column ' + df.groupby('ID').cumcount().add(1).astype(str))
  3. .reset_index())
  4. ID Column 1 Column 2 Column 3
  5. 0 1 blue red pink
  6. 1 2 gray white
  7. 2 3 yellow
  8. 3 4 orange
英文:

You can reshape your dataframe in a vectorized way:

  1. >>> (df.assign(col=df.groupby('ID').cumcount().add(1))
  2. .set_index(['ID', 'col'])['Column 1']
  3. .unstack('col').add_prefix('Column ')
  4. .reset_index().rename_axis(columns=None))
  5. ID Column 1 Column 2 Column 3
  6. 0 1 blue red pink
  7. 1 2 gray white NaN
  8. 2 3 yellow NaN NaN
  9. 3 4 orange NaN NaN

With pivot_table:

  1. >>> (df.pivot_table(index='ID', values='Column 1', aggfunc='first', fill_value='',
  2. columns='Column ' + df.groupby('ID').cumcount().add(1).astype(str))
  3. .reset_index())
  4. ID Column 1 Column 2 Column 3
  5. 0 1 blue red pink
  6. 1 2 gray white
  7. 2 3 yellow
  8. 3 4 orange

huangapple
  • 本文由 发表于 2023年2月8日 22:22:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75387144.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定