英文:
Python: Table where identical ID/Numbers with different values to being them on one line where the different values are appended to the right
问题
我有一个带有一些在多行上相同的ID的Pandas表格,但分配的值不同。如何将ID仅显示一次在一行上,并将各种值附加在多个列中?
起始点:
| ID | Column 1 | 
|---|---|
| 1 | blue | 
| 1 | red | 
| 2 | gray | 
| 3 | yellow | 
| 4 | orange | 
| 1 | pink | 
| 2 | white | 
期望的解决方案:
| ID | Column 1 | Column 2 | Column 3 | 
|---|---|---|---|
| 1 | blue | red | pink | 
| 2 | gray | white | |
| 3 | yellow | ||
| 4 | orange | 
英文:
I have a Pandas Table with some IDs that are identical on several lines but the assigned value is different. How is it possible to get a result where the ID is only shown once on one line and append the various values in multiple columns?
Starting point:
| ID | Column 1 | 
|---|---|
| 1 | blue | 
| 1 | red | 
| 2 | gray | 
| 3 | yellow | 
| 4 | orange | 
| 1 | pink | 
| 2 | white | 
Desired solution:
| ID | Column 1 | Column 2 | Column 3 | 
|---|---|---|---|
| 1 | blue | red | pink | 
| 2 | gray | white | |
| 3 | yellow | ||
| 4 | orange | 
答案1
得分: 0
按照ID分组,然后计算唯一的数值
df.groupby("ID")["Column 1"].apply(lambda x: pd.Series(x.unique())).unstack()
英文:
Groupby the ID and then compute the unique values
df.groupby("ID")["Column 1"].apply(lambda x: pd.Series(x.unique())).unstack()
答案2
得分: 0
你可以使用向量化的方式重塑你的数据框架:
(df.assign(col=df.groupby('ID').cumcount().add(1))
   .set_index(['ID', 'col'])['Column 1']
   .unstack('col').add_prefix('Column ')
   .reset_index().rename_axis(columns=None))
   ID Column 1 Column 2 Column 3
0   1     blue      red     pink
1   2     gray    white      NaN
2   3   yellow      NaN      NaN
3   4   orange      NaN      NaN
使用 pivot_table:
(df.pivot_table(index='ID', values='Column 1', aggfunc='first', fill_value='',
               columns='Column ' + df.groupby('ID').cumcount().add(1).astype(str))
  .reset_index())
   ID Column 1 Column 2 Column 3
0   1     blue      red     pink
1   2     gray    white        
2   3   yellow                  
3   4   orange                  
英文:
You can reshape your dataframe in a vectorized way:
>>> (df.assign(col=df.groupby('ID').cumcount().add(1))
       .set_index(['ID', 'col'])['Column 1']
       .unstack('col').add_prefix('Column ')
       .reset_index().rename_axis(columns=None))
   ID Column 1 Column 2 Column 3
0   1     blue      red     pink
1   2     gray    white      NaN
2   3   yellow      NaN      NaN
3   4   orange      NaN      NaN
With pivot_table:
>>> (df.pivot_table(index='ID', values='Column 1', aggfunc='first', fill_value='',
                   columns='Column ' + df.groupby('ID').cumcount().add(1).astype(str))
      .reset_index())
   ID Column 1 Column 2 Column 3
0   1     blue      red     pink
1   2     gray    white         
2   3   yellow                  
3   4   orange                  
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论