如何对多级索引(列)的数据框进行排序?

huangapple go评论100阅读模式
英文:

How do you sort data with multiindex (columns) dataframe?

问题

请原谅我的词汇有限。我仍在努力学习正确的术语,刚刚发现我创建了一个多索引的数据框,我正在学习如何操作它。

这个多索引数据框有30行和546列,看起来像这样的更大版本:

  1. 如何获取 'level 0' 列名的列表。

要获取 'level 0' 列名的列表,您可以使用以下代码:

  1. level_0_columns = df.columns.get_level_values(0).unique().tolist()
  1. 如何获取 'level 1' 列名的列表。

要获取 'level 1' 列名的列表,您可以使用以下代码:

  1. level_1_columns = df.columns.get_level_values(1).unique().tolist()
  1. 对于一行(日期),提取所有 'level 0' 和一个 'level 1' 索引的数据。例如,提取一天中一家公司的所有财务数据。

要做到这一点,您可以使用以下代码:

  1. date = '2023-01-02' # 指定日期
  2. level_0 = 'A' # 指定 'level 0' 列
  3. level_1 = 'aa' # 指定 'level 1' 列
  4. data_for_one_company = df.loc[date, (level_0, level_1)]
  1. 对于一行(日期),一个 'level 0' 列与所有 'level 0' 数据。例如,提取一天中所有公司的成交量数据。

要做到这一点,您可以使用以下代码:

  1. date = '2023-01-02' # 指定日期
  2. level_0 = 'A' # 指定 'level 0' 列
  3. volume_data_for_all_companies = df.loc[date, (level_0, slice(None))]

希望这些代码能帮助您进行多级索引数据框的操作。如果您需要更多帮助,可以随时提问。

英文:

First, please forgive my bad vocabulary. Im still struggling with the correct terms, and have just discovered that I have created a multiindexed dataframe, which Im trying to learn how to manipulate.

The multiindex dataframe has 30 rows and 546 columns, and looks like a bigger version of this:

A B C D
aa bb cc aa bb cc aa bb cc aa bb cc
Date
2023-01-02 1 24 6 3 2 7 3 10 12 5 9 21
2023-01-03 1 23 7 3 4 6 3 9 13 6 10 22
2023-01-04 2 22 8 4 6 7 3 9 12 8 14 24
2023-01-05 3 21 10 3 8 6 4 8 11 10 12 21

The index is a timestamp date, and the top level (level 0?) column indexes A, B, C, D, etc each have the same 91 second level (level 1?) members: aa, bb, cc, etc

Since there are 546 columns in total, and 91 'level 1' columns, there must be 6 'level 0' columns. I cant see them cos the tables so big it just shows the first and last.

In reality, its a table of stock data pulled off yahoo where A, B, C are the (6) financial values like close, volume, high, etc and aa, bb, cc, etc are the (91) company codes.

Id like to learn how to do the following:

  1. How to pull off a list of the 'level 0' column names.

  2. How to pull off a list of the 'level 1' column names.

  3. For 1 row (date), pull out the data for ALL 'level 0' and ONE 'level 1' index. (For example, all financial data for one company on one day).

  4. For 1 row (date), ONE 'level 0' with ALL 'level 0' data. For example, volume data for all companies on one day.

Ive been trying things like:

  1. df.loc[:,(['A','B'],['aa,'bb', 'cc'])]
  2. df.loc['2023-01-02', :]

which work, but I cant sort the brackets and colons right to do the above stuff.

Also,

  1. df.loc[:,(['A','D'],['aa,'cc','ff'])]

and

  1. df.loc['2023-01-05':,([A,C],[aa,dd])]

work, but

  1. df.loc['2023-01-05',([A:],[aa,dd])]

and

  1. df.loc['2023-01-05',(A:,[aa,dd])]

give invalid syntax. Can anyone explain, or maybe point me towards a tutorial that will help with the level definitions and round/square brackets and colons?

Thanks.

答案1

得分: 3

要提取level列名称的列表,您可以使用get_level_values

  1. df.columns.get_level_values(0)
  2. # Index(['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D'], dtype='object')
  3. df.columns.get_level_values(1)
  4. # Index(['aa', 'bb', 'cc', 'aa', 'bb', 'cc', 'aa', 'bb', 'cc', 'aa', 'bb', 'cc'], dtype='object')
  5. df.columns.get_level_values(0).unique()
  6. # Index(['A', 'B', 'C', 'D'], dtype='object')
  7. df.columns.get_level_values(1).unique()
  8. # Index(['aa', 'bb', 'cc'], dtype='object')

对于3和4,使用pd.IndexSlice会很方便:

  1. # 获取特定level one索引的所有level zero数据
  2. df.loc['2023-01-05', pd.IndexSlice[:, 'aa']]
  3. # A aa 3
  4. # B aa 3
  5. # C aa 4
  6. # D aa 10
  7. # Name: 2023-01-05, dtype: int64
  8. # 获取特定level zero索引的所有level one数据
  9. df.loc['2023-01-05', pd.IndexSlice['A', :]]
  10. # A aa 3
  11. # bb 21
  12. # cc 10
  13. # Name: 2023-01-05, dtype: int64
英文:

To pull a list of level column names, you can use get_level_values:

  1. df.columns.get_level_values(0)
  2. #Index(['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D'], dtype='object')
  3. df.columns.get_level_values(1)
  4. #Index(['aa', 'bb', 'cc', 'aa', 'bb', 'cc', 'aa', 'bb', 'cc', 'aa', 'bb', 'cc'], dtype='object')
  5. df.columns.get_level_values(0).unique()
  6. #Index(['A', 'B', 'C', 'D'], dtype='object')
  7. df.columns.get_level_values(1).unique()
  8. #Index(['aa', 'bb', 'cc'], dtype='object')

For 3 and 4, pd.IndexSlice would be convenient to use:

  1. # all level zero data for a specific level one index
  2. df.loc['2023-01-05', pd.IndexSlice[:, 'aa']]
  3. #A aa 3
  4. #B aa 3
  5. #C aa 4
  6. #D aa 10
  7. #Name: 2023-01-05, dtype: int64
  8. # all level one data for a specific level zero index
  9. df.loc['2023-01-05', pd.IndexSlice['A', :]]
  10. #A aa 3
  11. # bb 21
  12. # cc 10
  13. #Name: 2023-01-05, dtype: int64

huangapple
  • 本文由 发表于 2023年1月9日 03:46:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75050779.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定