Pandas 根据一列的行值进行分组。

huangapple go评论61阅读模式
英文:

Pandas groupby on basis of row values in one column

问题

然而,我想要根据 sheet_internal_id 来对 CSV 数据进行分组,以获得所需的结果:

Pandas 根据一列的行值进行分组。

我尝试使用以下代码:

df_1 = df.fillna('').groupby(['sheet_internal_id','nct_id'])['stat_design'].apply(','.join).reset_index()

但这并没有产生所期望的输出。

英文:

I have data in below format :

Pandas 根据一列的行值进行分组。

However I want to group by csv data on basis of sheet_internal_id such that I get the desired result:

Pandas 根据一列的行值进行分组。

I tried using below code :

df_1 = df.fillna('').groupby(['sheet_internal_id','nct_id'])['stat_design'].apply(','.join).reset_index()

but this is not giving the desired output.

答案1

得分: 1

使用 GroupBy.transform 结合 DataFrame.maskDataFrame.duplicated 将值设置为空字符串:

m = df['Stat_design'].notna()
df.loc[m, 'stat_design'] = (df[m].groupby(['sheet_internal_id','NCT_ID'])['Stat_design']
                                     .transform(','.join))

df = df.mask(df.duplicated(['sheet_internal_id','NCT_ID']),'')
print (df)
   sheet_internal_id NCT_ID Stat_design         stat_design
0                  1  101.0    Superior            Superior
1                  1  102.0   Non-Infer  Non-Infer,Superior
2                                                          
3                  2  105.0        Othr                Othr
4                  3  106.0    Superior   Superior,Superior
5                                                          
6                  4  107.0       Other               Other
7                  5    NaN         NaN                 NaN
8                  6  110.0       Other               Other
9                  7  110.0   Non-Infer           Non-Infer
10                 7  111.0   Non-Infer  Non-Infer,Superior
11   
英文:

Use GroupBy.transform with DataFrame.mask and DataFrame.duplicated for set values to empty strings:

m = df['Stat_design'].notna()
df.loc[m, 'stat_design'] = (df[m].groupby(['sheet_internal_id','NCT_ID'])['Stat_design']
                                 .transform(','.join))

df = df.mask(df.duplicated(['sheet_internal_id','NCT_ID']),'')
print (df)
   sheet_internal_id NCT_ID Stat_design         stat_design
0                  1  101.0    Superior            Superior
1                  1  102.0   Non-Infer  Non-Infer,Superior
2                                                          
3                  2  105.0        Othr                Othr
4                  3  106.0    Superior   Superior,Superior
5                                                          
6                  4  107.0       Other               Other
7                  5    NaN         NaN                 NaN
8                  6  110.0       Other               Other
9                  7  110.0   Non-Infer           Non-Infer
10                 7  111.0   Non-Infer  Non-Infer,Superior
11                                                         

huangapple
  • 本文由 发表于 2023年7月3日 15:15:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76602593.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定