英文:
Python: Table where identical ID/Numbers with different values to being them on one line where the different values are appended to the right
问题
我有一个带有一些在多行上相同的ID的Pandas表格,但分配的值不同。如何将ID仅显示一次在一行上,并将各种值附加在多个列中?
起始点:
ID | Column 1 |
---|---|
1 | blue |
1 | red |
2 | gray |
3 | yellow |
4 | orange |
1 | pink |
2 | white |
期望的解决方案:
ID | Column 1 | Column 2 | Column 3 |
---|---|---|---|
1 | blue | red | pink |
2 | gray | white | |
3 | yellow | ||
4 | orange |
英文:
I have a Pandas Table with some IDs that are identical on several lines but the assigned value is different. How is it possible to get a result where the ID is only shown once on one line and append the various values in multiple columns?
Starting point:
ID | Column 1 |
---|---|
1 | blue |
1 | red |
2 | gray |
3 | yellow |
4 | orange |
1 | pink |
2 | white |
Desired solution:
ID | Column 1 | Column 2 | Column 3 |
---|---|---|---|
1 | blue | red | pink |
2 | gray | white | |
3 | yellow | ||
4 | orange |
答案1
得分: 0
按照ID分组,然后计算唯一的数值
df.groupby("ID")["Column 1"].apply(lambda x: pd.Series(x.unique())).unstack()
英文:
Groupby the ID and then compute the unique values
df.groupby("ID")["Column 1"].apply(lambda x: pd.Series(x.unique())).unstack()
答案2
得分: 0
你可以使用向量化的方式重塑你的数据框架:
(df.assign(col=df.groupby('ID').cumcount().add(1))
.set_index(['ID', 'col'])['Column 1']
.unstack('col').add_prefix('Column ')
.reset_index().rename_axis(columns=None))
ID Column 1 Column 2 Column 3
0 1 blue red pink
1 2 gray white NaN
2 3 yellow NaN NaN
3 4 orange NaN NaN
使用 pivot_table
:
(df.pivot_table(index='ID', values='Column 1', aggfunc='first', fill_value='',
columns='Column ' + df.groupby('ID').cumcount().add(1).astype(str))
.reset_index())
ID Column 1 Column 2 Column 3
0 1 blue red pink
1 2 gray white
2 3 yellow
3 4 orange
英文:
You can reshape your dataframe in a vectorized way:
>>> (df.assign(col=df.groupby('ID').cumcount().add(1))
.set_index(['ID', 'col'])['Column 1']
.unstack('col').add_prefix('Column ')
.reset_index().rename_axis(columns=None))
ID Column 1 Column 2 Column 3
0 1 blue red pink
1 2 gray white NaN
2 3 yellow NaN NaN
3 4 orange NaN NaN
With pivot_table
:
>>> (df.pivot_table(index='ID', values='Column 1', aggfunc='first', fill_value='',
columns='Column ' + df.groupby('ID').cumcount().add(1).astype(str))
.reset_index())
ID Column 1 Column 2 Column 3
0 1 blue red pink
1 2 gray white
2 3 yellow
3 4 orange
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论