在多重索引组底部添加总计。

huangapple go评论65阅读模式
英文:

adding total to bottom of multiindex groups

问题

我正在尝试在我的多级索引数据框中为每个分组添加总和

                          计数
州       车型   状态
得克萨斯   公民   新           11
               未受损     11
               损坏       10
               报废         5
弗吉尼亚   公民   新           10
               未受损     20
               损坏       10
               报废         5

我想它看起来像:

                          计数
州       车型   状态
得克萨斯   公民   新           11
               损坏       10
               报废         5
               未受损     11
               总计         37
弗吉尼亚   公民   新           10
               损坏       10
               报废         5
               未受损     20
               总计         45

我尝试过

s = test.groupby(level=[0,1]).sum()
s.index = pd.MultiIndex.from_product(
展开收缩
])
df_out = df_full.append(s).sort_index()

但是它会抛出

> 未实现的错误: MultiIndex没有定义isna

英文:

I am trying to add a sum to my multiindex dataframe by each grouping

                          Count
state    car   status
texas    civic New           11
               undamaged     11
               damaged       10
               totalled       5
virginia civic New           10
               undamaged     20
               damaged       10
               totalled       5

I want it to look like:

                          Count
state    car   status
texas    civic New           11
               damaged       10
               totalled       5
               undamaged     11
               total         37
virginia civic New           10
               damaged       10
               totalled       5
               undamaged     20
               total         45

I have tried

s = test.groupby(level=[0,1]).sum()
s.index = pd.MultiIndex.from_product(
展开收缩
]) df_out = df_full.append(s).sort_index()

but it throws

> NotImplementedError: isna is not defined for MultiIndex

答案1

得分: 0

你的问题是 pd.MultiIndex.from_product 不支持多级索引和列表之间的乘积操作,你可以使用 pd.MultiIndex.from_frame 替代。

s = df.groupby(level=[0,1]).sum()
s.index = pd.MultiIndex.from_frame(s.index.to_frame().assign(status='total'))
out = df.append(s).sort_index()
print(out)

                          Count
state    car   status
texas    civic New           11
               damaged       10
               total         37
               totalled       5
               undamaged     11
virginia civic New           10
               damaged       10
               total         45
               totalled       5
               undamaged     20

然而,.sort_index() 会改变索引顺序,你可以尝试以下方式代替:

df_ = df['Count'].unstack()
df_['total'] = df_.sum(axis=1)
df_ = df_.stack().to_frame('Count')
# 或者在一行中完成
df_ = (df['Count'].unstack()
       .pipe(lambda d: d.assign(total=d.sum(axis=1)))
       .stack().to_frame('Count'))
print(df_)

                          Count
state    car   status
texas    civic New           11
               damaged       10
               totalled       5
               undamaged     11
               total         37
virginia civic New           10
               damaged       10
               totalled       5
               undamaged     20
               total         45
英文:

You problem is that pd.MultiIndex.from_product doesn't support product between multindex and list, instead you can use pd.MultiIndex.from_frame

s = df.groupby(level=[0,1]).sum()
s.index = pd.MultiIndex.from_frame(s.index.to_frame().assign(status='total'))
out = df.append(s).sort_index()
print(out)

                          Count
state    car   status
texas    civic New           11
               damaged       10
               total         37
               totalled       5
               undamaged     11
virginia civic New           10
               damaged       10
               total         45
               totalled       5
               undamaged     20

However, .sort_index() will change the index order, you can try following instead

df_ = df['Count'].unstack()
df_['total'] = df_.sum(axis=1)
df_ = df_.stack().to_frame('Count')
# or in one line
df_ = (df['Count'].unstack()
       .pipe(lambda d: d.assign(total=d.sum(axis=1)))
       .stack().to_frame('Count'))
print(df_)

                          Count
state    car   status
texas    civic New           11
               damaged       10
               totalled       5
               undamaged     11
               total         37
virginia civic New           10
               damaged       10
               totalled       5
               undamaged     20
               total         45

答案2

得分: 0

An easy way I’ve implemented this in my workflow is to use the Sidetables package. Link

You can use it like: test.groupby(level=[0,1]).sum().stb.subtotal(sub_level=2) will accomplish what you’re looking for.

英文:

An easy way I’ve implemented this in my workflow is to use the Sidetables package. Link

You can use it like: test.groupby(level=[0,1]).sum().stb.subtotal(sub_level=2) will accomplish what you’re looking for.

huangapple
  • 本文由 发表于 2023年3月31日 21:55:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/75899360.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定