将一个DataFrame的多层索引中的一部分作为列应用,是否可能?

huangapple go评论94阅读模式
英文:

Is it possible to take one portion of a multi-index from the rows of a dataframe and apply it as columns

问题

我有一个带有多级索引行和单个索引列的数据框。"foo"只是第一个实体类型,我会有其他实体类型,这将使数据框更长且难以查看。

我想将聚合方法的索引列移动到列中,以便我有一个如下的表格:

实体类型 分位数范围 bar baz
max avg std total max avg std total
foo q99-q100
foo q90-q99
foo q70-q90
foo q0-q70
quix q99-q100
quix q90-q99
quix q70-q90
quix q0-q70
maz q99-q100
maz q90-q99
maz q70-q90
maz q0-q70

有没有一种简单的命令可以做到这一点?类似于Melt或重置索引的操作?我甚至不确定从哪里开始搜索答案。

英文:

I have a dataframe with multi-indexed rows and a single index columns. "foo" is just the first EntityType, I will have others which will make this dataframe longer and harder to view.

将一个DataFrame的多层索引中的一部分作为列应用,是否可能?

I'd like to take the aggregation method index column and move it to the columns so I have a table like this

Entity Type Quantile Range bar baz
max avg std total max avg std total
foo q99-q100
foo q90-q99
foo q70-q90
foo q0-q70
quix q99-q100
quix q90-q99
quix q70-q90
quix q0-q70
maz q99-q100
maz q90-q99
maz q70-q90
maz q0-q70

Is there an easy command to do this? Something like Melt or reset index? I'm just not even sure where to being googling to get the answer.

答案1

得分: 1

以下是如何使用Pandas进行操作的示例,正如评论中建议的那样,以及重新索引:

import random

import pandas as pd

# Toy dataframe (same as yours after resetting the index)
df = pd.DataFrame(
    {
        "Entity Type": ["foo" for _ in range(20)],
        "Quantile Range": ["q99_q100", "q90_q99", "q70_q90", "q00_q70"] * 5,
        "AggregationMethod": ["max", "avg", "std", "total", "count"] * 4,
        "bar": [random.uniform(1, 999) for _ in range(20)],
        "baz": [random.uniform(1, 999) for _ in range(20)],
    }
)
df = df.pivot(
    index=["Entity Type", "Quantile Range"],
    columns="AggregationMethod",
    values=["bar", "baz"],
).reindex(["max", "avg", "std", "total"], level=1, axis=1)
print(df)
# Output

                                   bar
AggregationMethod                  max         avg         std       total   
Entity Type Quantile Range
foo         q00_q70         141.752307  822.270987  199.740853  595.444166  \
            q70_q90         383.574450  410.730888  838.562828  545.299705   
            q90_q99         339.588340  606.983173  935.142608  407.674059   
            q99_q100        161.833517  932.267262  157.149458  618.105967   

                                   baz
AggregationMethod                  max         avg         std       total  
Entity Type Quantile Range
foo         q00_q70          17.986766  298.760555  389.559554   49.925246  
            q70_q90         888.435092  695.713473  502.429534  209.356226  
            q90_q99         715.425998  209.749918  136.480141  525.729657  
            q99_q100        705.721265  956.273655  684.883477   39.114393 
英文:

Here is an example of how to do it with Pandas pivot, as suggested in the comments, and reindex:

import random

import pandas as pd

# Toy dataframe (same as yours after reseting the index)
df = pd.DataFrame(
    {
        "Entity Type": ["foo" for _ in range(20)],
        "Quantile Range": ["q99_q100", "q90_q99", "q70_q90", "q00_q70"] * 5,
        "AggregationMethod": ["max", "avg", "std", "total", "count"] * 4,
        "bar": [random.uniform(1, 999) for _ in range(20)],
        "baz": [random.uniform(1, 999) for _ in range(20)],
    }
)
df = df.pivot(
    index=["Entity Type", "Quantile Range"],
    columns="AggregationMethod",
    values=["bar", "baz"],
).reindex(["max", "avg", "std", "total"], level=1, axis=1)
print(df)
# Output

                                   bar
AggregationMethod                  max         avg         std       total   
Entity Type Quantile Range
foo         q00_q70         141.752307  822.270987  199.740853  595.444166  \
            q70_q90         383.574450  410.730888  838.562828  545.299705   
            q90_q99         339.588340  606.983173  935.142608  407.674059   
            q99_q100        161.833517  932.267262  157.149458  618.105967   

                                   baz
AggregationMethod                  max         avg         std       total  
Entity Type Quantile Range
foo         q00_q70          17.986766  298.760555  389.559554   49.925246  
            q70_q90         888.435092  695.713473  502.429534  209.356226  
            q90_q99         715.425998  209.749918  136.480141  525.729657  
            q99_q100        705.721265  956.273655  684.883477   39.114393 

答案2

得分: 1

你可以使用 .unstack 来实现这个功能,它的作用就像其名称所示,移除 MultiIndex 的层级以创建一个多列索引:

df.unstack()

这将默认取消堆叠最后一个层级,但如果你想指定其他层级,可以使用 level="AggregationMethod"level=-1(默认值)或 level=2

英文:

Posting my comment as an answer:

You can use .unstack for this, which does as the name suggests, removing the level of the MultiIndex to create a multi column index:

df.unstack()

This unstacks the last level by default, but you can specify this if you want to either with level="AggregationMethod", level=-1 (the default) or level=2.

huangapple
  • 本文由 发表于 2023年6月15日 02:50:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76476711.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定