2023年6月15日 02:50:14go评论126阅读模式

英文:

Is it possible to take one portion of a multi-index from the rows of a dataframe and apply it as columns

问题

我有一个带有多级索引行和单个索引列的数据框。"foo"只是第一个实体类型，我会有其他实体类型，这将使数据框更长且难以查看。

我想将聚合方法的索引列移动到列中，以便我有一个如下的表格：

实体类型	分位数范围	bar				baz
		max	avg	std	total	max	avg	std	total
foo	q99-q100
foo	q90-q99
foo	q70-q90
foo	q0-q70
quix	q99-q100
quix	q90-q99
quix	q70-q90
quix	q0-q70
maz	q99-q100
maz	q90-q99
maz	q70-q90
maz	q0-q70

有没有一种简单的命令可以做到这一点？类似于Melt或重置索引的操作？我甚至不确定从哪里开始搜索答案。

英文:

I have a dataframe with multi-indexed rows and a single index columns. "foo" is just the first EntityType, I will have others which will make this dataframe longer and harder to view.

I'd like to take the aggregation method index column and move it to the columns so I have a table like this

Entity Type	Quantile Range	bar				baz
		max	avg	std	total	max	avg	std	total
foo	q99-q100
foo	q90-q99
foo	q70-q90
foo	q0-q70
quix	q99-q100
quix	q90-q99
quix	q70-q90
quix	q0-q70
maz	q99-q100
maz	q90-q99
maz	q70-q90
maz	q0-q70

Is there an easy command to do this? Something like Melt or reset index? I'm just not even sure where to being googling to get the answer.

答案1

得分: 1

以下是如何使用Pandas进行操作的示例，正如评论中建议的那样，以及重新索引：

import random
import pandas as pd
# Toy dataframe (same as yours after resetting the index)
df = pd.DataFrame(
    {
        "Entity Type": ["foo" for _ in range(20)],
        "Quantile Range": ["q99_q100", "q90_q99", "q70_q90", "q00_q70"] * 5,
        "AggregationMethod": ["max", "avg", "std", "total", "count"] * 4,
        "bar": [random.uniform(1, 999) for _ in range(20)],
        "baz": [random.uniform(1, 999) for _ in range(20)],
    }
)

df = df.pivot(
    index=["Entity Type", "Quantile Range"],
    columns="AggregationMethod",
    values=["bar", "baz"],
).reindex(["max", "avg", "std", "total"], level=1, axis=1)

print(df)
# Output
                                   bar
AggregationMethod                  max         avg         std       total   
Entity Type Quantile Range
foo         q00_q70         141.752307  822.270987  199.740853  595.444166  \
            q70_q90         383.574450  410.730888  838.562828  545.299705   
            q90_q99         339.588340  606.983173  935.142608  407.674059   
            q99_q100        161.833517  932.267262  157.149458  618.105967   
                                   baz
AggregationMethod                  max         avg         std       total  
Entity Type Quantile Range
foo         q00_q70          17.986766  298.760555  389.559554   49.925246  
            q70_q90         888.435092  695.713473  502.429534  209.356226  
            q90_q99         715.425998  209.749918  136.480141  525.729657  
            q99_q100        705.721265  956.273655  684.883477   39.114393

英文:

Here is an example of how to do it with Pandas pivot, as suggested in the comments, and reindex:

import random
import pandas as pd
# Toy dataframe (same as yours after reseting the index)
df = pd.DataFrame(
    {
        &quot;Entity Type&quot;: [&quot;foo&quot; for _ in range(20)],
        &quot;Quantile Range&quot;: [&quot;q99_q100&quot;, &quot;q90_q99&quot;, &quot;q70_q90&quot;, &quot;q00_q70&quot;] * 5,
        &quot;AggregationMethod&quot;: [&quot;max&quot;, &quot;avg&quot;, &quot;std&quot;, &quot;total&quot;, &quot;count&quot;] * 4,
        &quot;bar&quot;: [random.uniform(1, 999) for _ in range(20)],
        &quot;baz&quot;: [random.uniform(1, 999) for _ in range(20)],
    }
)

df = df.pivot(
    index=[&quot;Entity Type&quot;, &quot;Quantile Range&quot;],
    columns=&quot;AggregationMethod&quot;,
    values=[&quot;bar&quot;, &quot;baz&quot;],
).reindex([&quot;max&quot;, &quot;avg&quot;, &quot;std&quot;, &quot;total&quot;], level=1, axis=1)

print(df)
# Output
                                   bar
AggregationMethod                  max         avg         std       total   
Entity Type Quantile Range
foo         q00_q70         141.752307  822.270987  199.740853  595.444166  \
            q70_q90         383.574450  410.730888  838.562828  545.299705   
            q90_q99         339.588340  606.983173  935.142608  407.674059   
            q99_q100        161.833517  932.267262  157.149458  618.105967   
                                   baz
AggregationMethod                  max         avg         std       total  
Entity Type Quantile Range
foo         q00_q70          17.986766  298.760555  389.559554   49.925246  
            q70_q90         888.435092  695.713473  502.429534  209.356226  
            q90_q99         715.425998  209.749918  136.480141  525.729657  
            q99_q100        705.721265  956.273655  684.883477   39.114393

答案2

得分: 1

你可以使用 .unstack 来实现这个功能，它的作用就像其名称所示，移除 MultiIndex 的层级以创建一个多列索引：

df.unstack()

这将默认取消堆叠最后一个层级，但如果你想指定其他层级，可以使用 level="AggregationMethod"、level=-1（默认值）或 level=2。

英文:

Posting my comment as an answer:

You can use .unstack for this, which does as the name suggests, removing the level of the MultiIndex to create a multi column index:

df.unstack()

This unstacks the last level by default, but you can specify this if you want to either with level="AggregationMethod", level=-1 (the default) or level=2.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将一个DataFrame的多层索引中的一部分作为列应用，是否可能？

问题

答案1

答案2

在Python中使用小时和星期几计算日期

最佳实践是更新包含列表和字典的字典列表中的字段。

Python Pandas hypostesis: average rating for the "expensive" books. Need some help understatding the basic features of pandas

如何关闭批处理显示的文件路径？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。