英文:
Pandas groupby on basis of row values in one column
问题
然而,我想要根据 sheet_internal_id 来对 CSV 数据进行分组,以获得所需的结果:
我尝试使用以下代码:
df_1 = df.fillna('').groupby(['sheet_internal_id','nct_id'])['stat_design'].apply(','.join).reset_index()
但这并没有产生所期望的输出。
英文:
I have data in below format :
However I want to group by csv data on basis of sheet_internal_id such that I get the desired result:
I tried using below code :
df_1 = df.fillna('').groupby(['sheet_internal_id','nct_id'])['stat_design'].apply(','.join).reset_index()
but this is not giving the desired output.
答案1
得分: 1
使用 GroupBy.transform
结合 DataFrame.mask
和 DataFrame.duplicated
将值设置为空字符串:
m = df['Stat_design'].notna()
df.loc[m, 'stat_design'] = (df[m].groupby(['sheet_internal_id','NCT_ID'])['Stat_design']
.transform(','.join))
df = df.mask(df.duplicated(['sheet_internal_id','NCT_ID']),'')
print (df)
sheet_internal_id NCT_ID Stat_design stat_design
0 1 101.0 Superior Superior
1 1 102.0 Non-Infer Non-Infer,Superior
2
3 2 105.0 Othr Othr
4 3 106.0 Superior Superior,Superior
5
6 4 107.0 Other Other
7 5 NaN NaN NaN
8 6 110.0 Other Other
9 7 110.0 Non-Infer Non-Infer
10 7 111.0 Non-Infer Non-Infer,Superior
11
英文:
Use GroupBy.transform
with DataFrame.mask
and DataFrame.duplicated
for set values to empty strings:
m = df['Stat_design'].notna()
df.loc[m, 'stat_design'] = (df[m].groupby(['sheet_internal_id','NCT_ID'])['Stat_design']
.transform(','.join))
df = df.mask(df.duplicated(['sheet_internal_id','NCT_ID']),'')
print (df)
sheet_internal_id NCT_ID Stat_design stat_design
0 1 101.0 Superior Superior
1 1 102.0 Non-Infer Non-Infer,Superior
2
3 2 105.0 Othr Othr
4 3 106.0 Superior Superior,Superior
5
6 4 107.0 Other Other
7 5 NaN NaN NaN
8 6 110.0 Other Other
9 7 110.0 Non-Infer Non-Infer
10 7 111.0 Non-Infer Non-Infer,Superior
11
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论