英文:
aggregate pivot in pandas with multiple repeated fields
问题
我有一个数据框,看起来像这样:
id Field_name Field_value
1 consent yes
1 _REACTION TIME_ 5547
1 age 24
1 gender X
1 _REACTION TIME_ 45396
1 education uni
1 language EN
1 _REACTION TIME_ 105187
2 consent yes
2 _REACTION TIME_ 3547
2 age 25
2 gender F
2 _REACTION TIME_ 42396
2 education uni
2 language EU
2 _REACTION TIME_ 115427
我想将其按每个id一行的方式排列,每个 _REACTION TIME_
行作为不同的列,如下所示:
id consent _REACTION TIME_1 age gender _REACTION TIME_2 education language _REACTION TIME_3
1 yes 5547 24 X 45396 uni EN 105187
2 yes 3547 25 F 42396 uni EU 115427
英文:
I have a dataframe that looks like this:
id Field_name Field_value
1 consent yes
1 _REACTION TIME_ 5547
1 age 24
1 gender X
1 _REACTION TIME_ 45396
1 education uni
1 language EN
1 _REACTION TIME_ 105187
2 consent yes
2 _REACTION TIME_ 3547
2 age 25
2 gender F
2 _REACTION TIME_ 42396
2 education uni
2 language EU
2 _REACTION TIME_ 115427
and I would like to have it as a row per id, with every _REACTION TIME_
row being a different column, such as:
id consent _REACTION TIME_1 age gender _REACTION TIME_2 education language _REACTION TIME_3
1 yes 5547 24 X 45396 uni EN 105187
2 yes 3547 25 F 42396 uni EU 115427
I have been looking for an answer to this all over SO but I can't find it for this particular issue when only some of the entries are repeated, but they are repeated multiple times.
Thanks in advance!
答案1
得分: 2
使用GroupBy.cumcount
仅适用于由DataFrame.duplicated
检测到的重复行,因此可以通过DataFrame.pivot
进行可能的数据透视操作,最后,按原始列的顺序添加DataFrame.reindex
:
m = df.duplicated(['id', 'Field_name'], keep=False)
df.loc[m, 'Field_name'] += df[m].groupby(['id', 'Field_name']).cumcount().add(1).astype(str)
cols = df['Field_name'].unique()
df = df.pivot(index='id', columns='Field_name', values='Field_value').reindex(cols, axis=1)
print(df)
解决方案避免覆盖原始DataFrame,操作类似:
m = df.duplicated(['id', 'Field_name'], keep=False)
s = df['Field_name'].add(df.groupby(['id', 'Field_name']).cumcount().add(1)
.astype(str)).where(m, df['Field_name'])
df1 = (df.assign(Field_name=s)
.pivot(index='id', columns='Field_name', values='Field_value')
.reindex(s.unique(), axis=1))
print(df1)
希望这有所帮助。
英文:
Use GroupBy.cumcount
only for rows with duplicates by DataFrame.duplicated
, so possible pivoting by DataFrame.pivot
, last for original order of columns add DataFrame.reindex
:
m = df.duplicated(['id','Field_name'], keep=False)
df.loc[m, 'Field_name'] += df[m].groupby(['id','Field_name']).cumcount().add(1).astype(str)
cols = df['Field_name'].unique()
df = df.pivot(index='id', columns='Field_name', values='Field_value').reindex(cols, axis=1)
print (df)
Field_name consent _REACTION TIME_1 age gender _REACTION TIME_2 education \
id
1 yes 5547 24 X 45396 uni
2 yes 3547 25 F 42396 uni
Field_name language _REACTION TIME_3
id
1 EN 105187
2 EU 115427
Solution avoiding overwrite original DataFrame is similar:
m = df.duplicated(['id','Field_name'], keep=False)
s = df['Field_name'].add(df.groupby(['id','Field_name']).cumcount().add(1)
.astype(str)).where(m, df['Field_name'])
df1 = (df.assign(Field_name=s)
.pivot(index='id', columns='Field_name', values='Field_value')
.reindex(s.unique(), axis=1))
print (df1)
Field_name consent _REACTION TIME_1 age gender _REACTION TIME_2 education \
id
1 yes 5547 24 X 45396 uni
2 yes 3547 25 F 42396 uni
Field_name language _REACTION TIME_3
id
1 EN 105187
2 EU 115427
答案2
得分: 1
If you want to remain _REACTION TIME_
instead of renaming it as _REACTION TIME_1
in column header, you can do groupby.apply
out = (df.groupby('id').apply(lambda g: g.drop('id', axis=1).set_index('Field_name').T)
.reset_index(level=0).reset_index(drop=True)
.rename_axis('', axis=1))
print(out)
id consent _REACTION_TIME_ age gender _REACTION_TIME_ education language _REACTION_TIME_
0 1 yes 5547 24 X 45396 uni EN 105187
1 2 yes 3547 25 F 42396 uni EU 115427
英文:
If you want to remain _REACTION TIME_
instead of renaming it as _REACTION TIME_1
in column header, you can do groupby.apply
out = (df.groupby('id').apply(lambda g: g.drop('id', axis=1).set_index('Field_name').T)
.reset_index(level=0).reset_index(drop=True)
.rename_axis('', axis=1))
print(out)
id consent _REACTION_TIME_ age gender _REACTION_TIME_ education language _REACTION_TIME_
0 1 yes 5547 24 X 45396 uni EN 105187
1 2 yes 3547 25 F 42396 uni EU 115427
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论