英文:
Tricky groupby several columns of a similar prefix while taking the sum based off of categorical values within a column (Pandas)
问题
我想对几列进行分组,如果前缀相似的话,并根据列内的分类值进行求和。
数据
name      type    size
AA:3400            5
AA:3401   FALSE    1
AA:3402   FALSE    2
AA:3404   FALSE    0
AA:3409   FALSE    1
AA:3410   FALSE    8
AA:3412   FALSE    9
BB:3400   TRUE     4
BB:3401   FALSE    7
期望结果
name    type    size
AA      TRUE    0
AA      FALSE   21
AA              5
BB      TRUE    4
BB      FALSE   7
BB
正在进行的操作
df.groupby(['name', 'type'], dropna=False, as_index=False)['size'].sum()
但是,如果值具有相同的前缀,我该如何分组呢?欢迎任何建议。
英文:
I am looking to groupby several columns if the prefix is similar and take the sum based off of categorical values within a column.
Data
name      type    size
AA:3400            5
AA:3401   FALSE    1
AA:3402   FALSE    2
AA:3404   FALSE    0
AA:3409   FALSE    1
AA:3410   FALSE    8
AA:3412   FALSE    9
BB:3400   TRUE     4
BB:3401   FALSE    7
Desired
name    type    size
AA      TRUE    0
AA      FALSE   21
AA              5
BB      TRUE    4
BB      FALSE   7
BB
Doing
df.groupby(['name', 'type'], dropna=False, as_index=False)['size'].sum()
However, how can I group if the value has the same prefix? Any suggestion is appreciated.
答案1
得分: 3
以下是您要翻译的代码部分:
out = (
    df.assign(type=df["type"].astype(
        pd.CategoricalDtype(["TRUE", "FALSE"], ordered=True)))
      .groupby([df["name"].str.split(":").str[0], "type"],
               dropna=False, group_keys=False)["size"].sum().reset_index()
)
输出结果:
print(out)
  name   type  size
0   AA   TRUE     0
1   AA  FALSE    21
2   AA    NaN     5
3   BB   TRUE     4
4   BB  FALSE     7
5   BB    NaN     0
请注意,翻译结果仅包括代码和输出的部分,没有其他内容。
英文:
You can try:
out = (
    df.assign(type= df["type"].astype(
        pd.CategoricalDtype(["TRUE", "FALSE"], ordered=True)))
      .groupby([df["name"].str.split(":").str[0], "type"],
               dropna=False, group_keys=False)["size"].sum().reset_index()
)
Output:
print(out)
  name   type  size
0   AA   TRUE     0
1   AA  FALSE    21
2   AA    NaN     5
3   BB   TRUE     4
4   BB  FALSE     7
5   BB    NaN     0
答案2
得分: 3
以下是您要翻译的内容:
就像 @Timeless 的解决方案一样,我会这样做:
df['type'] = df['type'].astype('category')
df_out = df.groupby([df['name'].str[:2], 'type'], 
                    dropna=False, 
                    observed=False)['size'].sum().reset_index()
print(df_out)
输出:
      name   type  size
    0   AA  False    21
    1   AA   True     0
    2   AA    NaN     5
    3   BB  False     7
    4   BB   True     4
    5   BB    NaN     0
英文:
Much like @Timeless solution, I'd do it like this:
df['type'] = df['type'].astype('category')
df_out = df.groupby([df['name'].str[:2], 'type'], 
                    dropna=False, 
                    observed=False)['size'].sum().reset_index()
print(df_out)
Output:
  name   type  size
0   AA  False    21
1   AA   True     0
2   AA    NaN     5
3   BB  False     7
4   BB   True     4
5   BB    NaN     0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论