将一个DataFrame的多层索引中的一部分作为列应用,是否可能?

huangapple go评论126阅读模式
英文:

Is it possible to take one portion of a multi-index from the rows of a dataframe and apply it as columns

问题

我有一个带有多级索引行和单个索引列的数据框。"foo"只是第一个实体类型,我会有其他实体类型,这将使数据框更长且难以查看。

我想将聚合方法的索引列移动到列中,以便我有一个如下的表格:

实体类型 分位数范围 bar baz
max avg std total max avg std total
foo q99-q100
foo q90-q99
foo q70-q90
foo q0-q70
quix q99-q100
quix q90-q99
quix q70-q90
quix q0-q70
maz q99-q100
maz q90-q99
maz q70-q90
maz q0-q70

有没有一种简单的命令可以做到这一点?类似于Melt或重置索引的操作?我甚至不确定从哪里开始搜索答案。

英文:

I have a dataframe with multi-indexed rows and a single index columns. "foo" is just the first EntityType, I will have others which will make this dataframe longer and harder to view.

将一个DataFrame的多层索引中的一部分作为列应用,是否可能?

I'd like to take the aggregation method index column and move it to the columns so I have a table like this

Entity Type Quantile Range bar baz
max avg std total max avg std total
foo q99-q100
foo q90-q99
foo q70-q90
foo q0-q70
quix q99-q100
quix q90-q99
quix q70-q90
quix q0-q70
maz q99-q100
maz q90-q99
maz q70-q90
maz q0-q70

Is there an easy command to do this? Something like Melt or reset index? I'm just not even sure where to being googling to get the answer.

答案1

得分: 1

以下是如何使用Pandas进行操作的示例,正如评论中建议的那样,以及重新索引:

  1. import random
  2. import pandas as pd
  3. # Toy dataframe (same as yours after resetting the index)
  4. df = pd.DataFrame(
  5. {
  6. "Entity Type": ["foo" for _ in range(20)],
  7. "Quantile Range": ["q99_q100", "q90_q99", "q70_q90", "q00_q70"] * 5,
  8. "AggregationMethod": ["max", "avg", "std", "total", "count"] * 4,
  9. "bar": [random.uniform(1, 999) for _ in range(20)],
  10. "baz": [random.uniform(1, 999) for _ in range(20)],
  11. }
  12. )
  1. df = df.pivot(
  2. index=["Entity Type", "Quantile Range"],
  3. columns="AggregationMethod",
  4. values=["bar", "baz"],
  5. ).reindex(["max", "avg", "std", "total"], level=1, axis=1)
  1. print(df)
  2. # Output
  3. bar
  4. AggregationMethod max avg std total
  5. Entity Type Quantile Range
  6. foo q00_q70 141.752307 822.270987 199.740853 595.444166 \
  7. q70_q90 383.574450 410.730888 838.562828 545.299705
  8. q90_q99 339.588340 606.983173 935.142608 407.674059
  9. q99_q100 161.833517 932.267262 157.149458 618.105967
  10. baz
  11. AggregationMethod max avg std total
  12. Entity Type Quantile Range
  13. foo q00_q70 17.986766 298.760555 389.559554 49.925246
  14. q70_q90 888.435092 695.713473 502.429534 209.356226
  15. q90_q99 715.425998 209.749918 136.480141 525.729657
  16. q99_q100 705.721265 956.273655 684.883477 39.114393
英文:

Here is an example of how to do it with Pandas pivot, as suggested in the comments, and reindex:

  1. import random
  2. import pandas as pd
  3. # Toy dataframe (same as yours after reseting the index)
  4. df = pd.DataFrame(
  5. {
  6. "Entity Type": ["foo" for _ in range(20)],
  7. "Quantile Range": ["q99_q100", "q90_q99", "q70_q90", "q00_q70"] * 5,
  8. "AggregationMethod": ["max", "avg", "std", "total", "count"] * 4,
  9. "bar": [random.uniform(1, 999) for _ in range(20)],
  10. "baz": [random.uniform(1, 999) for _ in range(20)],
  11. }
  12. )
  1. df = df.pivot(
  2. index=["Entity Type", "Quantile Range"],
  3. columns="AggregationMethod",
  4. values=["bar", "baz"],
  5. ).reindex(["max", "avg", "std", "total"], level=1, axis=1)
  1. print(df)
  2. # Output
  3. bar
  4. AggregationMethod max avg std total
  5. Entity Type Quantile Range
  6. foo q00_q70 141.752307 822.270987 199.740853 595.444166 \
  7. q70_q90 383.574450 410.730888 838.562828 545.299705
  8. q90_q99 339.588340 606.983173 935.142608 407.674059
  9. q99_q100 161.833517 932.267262 157.149458 618.105967
  10. baz
  11. AggregationMethod max avg std total
  12. Entity Type Quantile Range
  13. foo q00_q70 17.986766 298.760555 389.559554 49.925246
  14. q70_q90 888.435092 695.713473 502.429534 209.356226
  15. q90_q99 715.425998 209.749918 136.480141 525.729657
  16. q99_q100 705.721265 956.273655 684.883477 39.114393

答案2

得分: 1

你可以使用 .unstack 来实现这个功能,它的作用就像其名称所示,移除 MultiIndex 的层级以创建一个多列索引:

  1. df.unstack()

这将默认取消堆叠最后一个层级,但如果你想指定其他层级,可以使用 level="AggregationMethod"level=-1(默认值)或 level=2

英文:

Posting my comment as an answer:

You can use .unstack for this, which does as the name suggests, removing the level of the MultiIndex to create a multi column index:

  1. df.unstack()

This unstacks the last level by default, but you can specify this if you want to either with level="AggregationMethod", level=-1 (the default) or level=2.

huangapple
  • 本文由 发表于 2023年6月15日 02:50:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76476711.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定