Python: Table where identical ID/Numbers with different values to being them on one line where the different values are appended to the right

huangapple go评论62阅读模式
英文:

Python: Table where identical ID/Numbers with different values to being them on one line where the different values are appended to the right

问题

我有一个带有一些在多行上相同的ID的Pandas表格,但分配的值不同。如何将ID仅显示一次在一行上,并将各种值附加在多个列中?

起始点:

ID Column 1
1 blue
1 red
2 gray
3 yellow
4 orange
1 pink
2 white

期望的解决方案:

ID Column 1 Column 2 Column 3
1 blue red pink
2 gray white
3 yellow
4 orange
英文:

I have a Pandas Table with some IDs that are identical on several lines but the assigned value is different. How is it possible to get a result where the ID is only shown once on one line and append the various values in multiple columns?

Starting point:

ID Column 1
1 blue
1 red
2 gray
3 yellow
4 orange
1 pink
2 white

Desired solution:

ID Column 1 Column 2 Column 3
1 blue red pink
2 gray white
3 yellow
4 orange

答案1

得分: 0

按照ID分组,然后计算唯一的数值

df.groupby("ID")["Column 1"].apply(lambda x: pd.Series(x.unique())).unstack()
英文:

Groupby the ID and then compute the unique values

df.groupby("ID")["Column 1"].apply(lambda x: pd.Series(x.unique())).unstack()

答案2

得分: 0

你可以使用向量化的方式重塑你的数据框架:

(df.assign(col=df.groupby('ID').cumcount().add(1))
   .set_index(['ID', 'col'])['Column 1']
   .unstack('col').add_prefix('Column ')
   .reset_index().rename_axis(columns=None))

   ID Column 1 Column 2 Column 3
0   1     blue      red     pink
1   2     gray    white      NaN
2   3   yellow      NaN      NaN
3   4   orange      NaN      NaN

使用 pivot_table

(df.pivot_table(index='ID', values='Column 1', aggfunc='first', fill_value='',
               columns='Column ' + df.groupby('ID').cumcount().add(1).astype(str))
  .reset_index())

   ID Column 1 Column 2 Column 3
0   1     blue      red     pink
1   2     gray    white        
2   3   yellow                  
3   4   orange                  
英文:

You can reshape your dataframe in a vectorized way:

>>> (df.assign(col=df.groupby('ID').cumcount().add(1))
       .set_index(['ID', 'col'])['Column 1']
       .unstack('col').add_prefix('Column ')
       .reset_index().rename_axis(columns=None))

   ID Column 1 Column 2 Column 3
0   1     blue      red     pink
1   2     gray    white      NaN
2   3   yellow      NaN      NaN
3   4   orange      NaN      NaN

With pivot_table:

>>> (df.pivot_table(index='ID', values='Column 1', aggfunc='first', fill_value='',
                   columns='Column ' + df.groupby('ID').cumcount().add(1).astype(str))
      .reset_index())

   ID Column 1 Column 2 Column 3
0   1     blue      red     pink
1   2     gray    white         
2   3   yellow                  
3   4   orange                  

huangapple
  • 本文由 发表于 2023年2月8日 22:22:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75387144.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定