如何对多级索引(列)的数据框进行排序?

huangapple go评论69阅读模式
英文:

How do you sort data with multiindex (columns) dataframe?

问题

请原谅我的词汇有限。我仍在努力学习正确的术语,刚刚发现我创建了一个多索引的数据框,我正在学习如何操作它。

这个多索引数据框有30行和546列,看起来像这样的更大版本:

  1. 如何获取 'level 0' 列名的列表。

要获取 'level 0' 列名的列表,您可以使用以下代码:

level_0_columns = df.columns.get_level_values(0).unique().tolist()
  1. 如何获取 'level 1' 列名的列表。

要获取 'level 1' 列名的列表,您可以使用以下代码:

level_1_columns = df.columns.get_level_values(1).unique().tolist()
  1. 对于一行(日期),提取所有 'level 0' 和一个 'level 1' 索引的数据。例如,提取一天中一家公司的所有财务数据。

要做到这一点,您可以使用以下代码:

date = '2023-01-02'  # 指定日期
level_0 = 'A'  # 指定 'level 0' 列
level_1 = 'aa'  # 指定 'level 1' 列

data_for_one_company = df.loc[date, (level_0, level_1)]
  1. 对于一行(日期),一个 'level 0' 列与所有 'level 0' 数据。例如,提取一天中所有公司的成交量数据。

要做到这一点,您可以使用以下代码:

date = '2023-01-02'  # 指定日期
level_0 = 'A'  # 指定 'level 0' 列

volume_data_for_all_companies = df.loc[date, (level_0, slice(None))]

希望这些代码能帮助您进行多级索引数据框的操作。如果您需要更多帮助,可以随时提问。

英文:

First, please forgive my bad vocabulary. Im still struggling with the correct terms, and have just discovered that I have created a multiindexed dataframe, which Im trying to learn how to manipulate.

The multiindex dataframe has 30 rows and 546 columns, and looks like a bigger version of this:

A B C D
aa bb cc aa bb cc aa bb cc aa bb cc
Date
2023-01-02 1 24 6 3 2 7 3 10 12 5 9 21
2023-01-03 1 23 7 3 4 6 3 9 13 6 10 22
2023-01-04 2 22 8 4 6 7 3 9 12 8 14 24
2023-01-05 3 21 10 3 8 6 4 8 11 10 12 21

The index is a timestamp date, and the top level (level 0?) column indexes A, B, C, D, etc each have the same 91 second level (level 1?) members: aa, bb, cc, etc

Since there are 546 columns in total, and 91 'level 1' columns, there must be 6 'level 0' columns. I cant see them cos the tables so big it just shows the first and last.

In reality, its a table of stock data pulled off yahoo where A, B, C are the (6) financial values like close, volume, high, etc and aa, bb, cc, etc are the (91) company codes.

Id like to learn how to do the following:

  1. How to pull off a list of the 'level 0' column names.

  2. How to pull off a list of the 'level 1' column names.

  3. For 1 row (date), pull out the data for ALL 'level 0' and ONE 'level 1' index. (For example, all financial data for one company on one day).

  4. For 1 row (date), ONE 'level 0' with ALL 'level 0' data. For example, volume data for all companies on one day.

Ive been trying things like:

df.loc[:,(['A','B'],['aa,'bb', 'cc'])]
df.loc['2023-01-02', :]

which work, but I cant sort the brackets and colons right to do the above stuff.

Also,

 df.loc[:,(['A','D'],['aa,'cc','ff'])]

and

df.loc['2023-01-05':,([A,C],[aa,dd])]

work, but

df.loc['2023-01-05',([A:],[aa,dd])]

and

df.loc['2023-01-05',(A:,[aa,dd])]

give invalid syntax. Can anyone explain, or maybe point me towards a tutorial that will help with the level definitions and round/square brackets and colons?

Thanks.

答案1

得分: 3

要提取level列名称的列表,您可以使用get_level_values

df.columns.get_level_values(0)
# Index(['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D'], dtype='object')

df.columns.get_level_values(1)
# Index(['aa', 'bb', 'cc', 'aa', 'bb', 'cc', 'aa', 'bb', 'cc', 'aa', 'bb', 'cc'], dtype='object')

df.columns.get_level_values(0).unique()
# Index(['A', 'B', 'C', 'D'], dtype='object')

df.columns.get_level_values(1).unique()
# Index(['aa', 'bb', 'cc'], dtype='object')

对于3和4,使用pd.IndexSlice会很方便:

# 获取特定level one索引的所有level zero数据
df.loc['2023-01-05', pd.IndexSlice[:, 'aa']]

# A  aa     3
# B  aa     3
# C  aa     4
# D  aa    10
# Name: 2023-01-05, dtype: int64

# 获取特定level zero索引的所有level one数据
df.loc['2023-01-05', pd.IndexSlice['A', :]]
# A  aa     3
#    bb    21
#    cc    10
# Name: 2023-01-05, dtype: int64
英文:

To pull a list of level column names, you can use get_level_values:

df.columns.get_level_values(0)
#Index(['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D'], dtype='object')

df.columns.get_level_values(1)
#Index(['aa', 'bb', 'cc', 'aa', 'bb', 'cc', 'aa', 'bb', 'cc', 'aa', 'bb', 'cc'], dtype='object')

df.columns.get_level_values(0).unique()
#Index(['A', 'B', 'C', 'D'], dtype='object')

df.columns.get_level_values(1).unique()
#Index(['aa', 'bb', 'cc'], dtype='object')

For 3 and 4, pd.IndexSlice would be convenient to use:

# all level zero data for a specific level one index
df.loc['2023-01-05', pd.IndexSlice[:, 'aa']]

#A  aa     3
#B  aa     3
#C  aa     4
#D  aa    10
#Name: 2023-01-05, dtype: int64

# all level one data for a specific level zero index

df.loc['2023-01-05', pd.IndexSlice['A', :]]
#A  aa     3
#   bb    21
#   cc    10
#Name: 2023-01-05, dtype: int64

huangapple
  • 本文由 发表于 2023年1月9日 03:46:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75050779.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定