英文:
Is it possible to take one portion of a multi-index from the rows of a dataframe and apply it as columns
问题
我有一个带有多级索引行和单个索引列的数据框。"foo"只是第一个实体类型,我会有其他实体类型,这将使数据框更长且难以查看。
我想将聚合方法的索引列移动到列中,以便我有一个如下的表格:
实体类型 | 分位数范围 | bar | baz | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
max | avg | std | total | max | avg | std | total | ||||
foo | q99-q100 | ||||||||||
foo | q90-q99 | ||||||||||
foo | q70-q90 | ||||||||||
foo | q0-q70 | ||||||||||
quix | q99-q100 | ||||||||||
quix | q90-q99 | ||||||||||
quix | q70-q90 | ||||||||||
quix | q0-q70 | ||||||||||
maz | q99-q100 | ||||||||||
maz | q90-q99 | ||||||||||
maz | q70-q90 | ||||||||||
maz | q0-q70 |
有没有一种简单的命令可以做到这一点?类似于Melt或重置索引的操作?我甚至不确定从哪里开始搜索答案。
英文:
I have a dataframe with multi-indexed rows and a single index columns. "foo" is just the first EntityType, I will have others which will make this dataframe longer and harder to view.
I'd like to take the aggregation method index column and move it to the columns so I have a table like this
Entity Type | Quantile Range | bar | baz | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
max | avg | std | total | max | avg | std | total | ||||
foo | q99-q100 | ||||||||||
foo | q90-q99 | ||||||||||
foo | q70-q90 | ||||||||||
foo | q0-q70 | ||||||||||
quix | q99-q100 | ||||||||||
quix | q90-q99 | ||||||||||
quix | q70-q90 | ||||||||||
quix | q0-q70 | ||||||||||
maz | q99-q100 | ||||||||||
maz | q90-q99 | ||||||||||
maz | q70-q90 | ||||||||||
maz | q0-q70 |
Is there an easy command to do this? Something like Melt or reset index? I'm just not even sure where to being googling to get the answer.
答案1
得分: 1
以下是如何使用Pandas进行操作的示例,正如评论中建议的那样,以及重新索引:
import random
import pandas as pd
# Toy dataframe (same as yours after resetting the index)
df = pd.DataFrame(
{
"Entity Type": ["foo" for _ in range(20)],
"Quantile Range": ["q99_q100", "q90_q99", "q70_q90", "q00_q70"] * 5,
"AggregationMethod": ["max", "avg", "std", "total", "count"] * 4,
"bar": [random.uniform(1, 999) for _ in range(20)],
"baz": [random.uniform(1, 999) for _ in range(20)],
}
)
df = df.pivot(
index=["Entity Type", "Quantile Range"],
columns="AggregationMethod",
values=["bar", "baz"],
).reindex(["max", "avg", "std", "total"], level=1, axis=1)
print(df)
# Output
bar
AggregationMethod max avg std total
Entity Type Quantile Range
foo q00_q70 141.752307 822.270987 199.740853 595.444166 \
q70_q90 383.574450 410.730888 838.562828 545.299705
q90_q99 339.588340 606.983173 935.142608 407.674059
q99_q100 161.833517 932.267262 157.149458 618.105967
baz
AggregationMethod max avg std total
Entity Type Quantile Range
foo q00_q70 17.986766 298.760555 389.559554 49.925246
q70_q90 888.435092 695.713473 502.429534 209.356226
q90_q99 715.425998 209.749918 136.480141 525.729657
q99_q100 705.721265 956.273655 684.883477 39.114393
英文:
Here is an example of how to do it with Pandas pivot, as suggested in the comments, and reindex:
import random
import pandas as pd
# Toy dataframe (same as yours after reseting the index)
df = pd.DataFrame(
{
"Entity Type": ["foo" for _ in range(20)],
"Quantile Range": ["q99_q100", "q90_q99", "q70_q90", "q00_q70"] * 5,
"AggregationMethod": ["max", "avg", "std", "total", "count"] * 4,
"bar": [random.uniform(1, 999) for _ in range(20)],
"baz": [random.uniform(1, 999) for _ in range(20)],
}
)
df = df.pivot(
index=["Entity Type", "Quantile Range"],
columns="AggregationMethod",
values=["bar", "baz"],
).reindex(["max", "avg", "std", "total"], level=1, axis=1)
print(df)
# Output
bar
AggregationMethod max avg std total
Entity Type Quantile Range
foo q00_q70 141.752307 822.270987 199.740853 595.444166 \
q70_q90 383.574450 410.730888 838.562828 545.299705
q90_q99 339.588340 606.983173 935.142608 407.674059
q99_q100 161.833517 932.267262 157.149458 618.105967
baz
AggregationMethod max avg std total
Entity Type Quantile Range
foo q00_q70 17.986766 298.760555 389.559554 49.925246
q70_q90 888.435092 695.713473 502.429534 209.356226
q90_q99 715.425998 209.749918 136.480141 525.729657
q99_q100 705.721265 956.273655 684.883477 39.114393
答案2
得分: 1
你可以使用 .unstack 来实现这个功能,它的作用就像其名称所示,移除 MultiIndex 的层级以创建一个多列索引:
df.unstack()
这将默认取消堆叠最后一个层级,但如果你想指定其他层级,可以使用 level="AggregationMethod"
、level=-1
(默认值)或 level=2
。
英文:
Posting my comment as an answer:
You can use .unstack for this, which does as the name suggests, removing the level of the MultiIndex to create a multi column index:
df.unstack()
This unstacks the last level by default, but you can specify this if you want to either with level="AggregationMethod"
, level=-1
(the default) or level=2
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论