2023年1月9日 03:46:35go评论100阅读模式

英文:

How do you sort data with multiindex (columns) dataframe?

问题

请原谅我的词汇有限。我仍在努力学习正确的术语，刚刚发现我创建了一个多索引的数据框，我正在学习如何操作它。

这个多索引数据框有30行和546列，看起来像这样的更大版本：

如何获取 'level 0' 列名的列表。

要获取 'level 0' 列名的列表，您可以使用以下代码：

level_0_columns = df.columns.get_level_values(0).unique().tolist()

如何获取 'level 1' 列名的列表。

要获取 'level 1' 列名的列表，您可以使用以下代码：

level_1_columns = df.columns.get_level_values(1).unique().tolist()

对于一行（日期），提取所有 'level 0' 和一个 'level 1' 索引的数据。例如，提取一天中一家公司的所有财务数据。

要做到这一点，您可以使用以下代码：

date = '2023-01-02'  # 指定日期
level_0 = 'A'  # 指定 'level 0' 列
level_1 = 'aa'  # 指定 'level 1' 列
data_for_one_company = df.loc[date, (level_0, level_1)]

对于一行（日期），一个 'level 0' 列与所有 'level 0' 数据。例如，提取一天中所有公司的成交量数据。

要做到这一点，您可以使用以下代码：

date = '2023-01-02'  # 指定日期
level_0 = 'A'  # 指定 'level 0' 列
volume_data_for_all_companies = df.loc[date, (level_0, slice(None))]

希望这些代码能帮助您进行多级索引数据框的操作。如果您需要更多帮助，可以随时提问。

英文:

First, please forgive my bad vocabulary. Im still struggling with the correct terms, and have just discovered that I have created a multiindexed dataframe, which Im trying to learn how to manipulate.

The multiindex dataframe has 30 rows and 546 columns, and looks like a bigger version of this:

	A			B			C			D
	aa	bb	cc	aa	bb	cc	aa	bb	cc	aa	bb	cc
Date
2023-01-02	1	24	6	3	2	7	3	10	12	5	9	21
2023-01-03	1	23	7	3	4	6	3	9	13	6	10	22
2023-01-04	2	22	8	4	6	7	3	9	12	8	14	24
2023-01-05	3	21	10	3	8	6	4	8	11	10	12	21

The index is a timestamp date, and the top level (level 0?) column indexes A, B, C, D, etc each have the same 91 second level (level 1?) members: aa, bb, cc, etc

Since there are 546 columns in total, and 91 'level 1' columns, there must be 6 'level 0' columns. I cant see them cos the tables so big it just shows the first and last.

In reality, its a table of stock data pulled off yahoo where A, B, C are the (6) financial values like close, volume, high, etc and aa, bb, cc, etc are the (91) company codes.

Id like to learn how to do the following:

How to pull off a list of the 'level 0' column names.
How to pull off a list of the 'level 1' column names.
For 1 row (date), pull out the data for ALL 'level 0' and ONE 'level 1' index. (For example, all financial data for one company on one day).
For 1 row (date), ONE 'level 0' with ALL 'level 0' data. For example, volume data for all companies on one day.

Ive been trying things like:

df.loc[:,([&#39;A&#39;,&#39;B&#39;],[&#39;aa,&#39;bb&#39;, &#39;cc&#39;])]
df.loc[&#39;2023-01-02&#39;, :]

which work, but I cant sort the brackets and colons right to do the above stuff.

Also,

 df.loc[:,([&#39;A&#39;,&#39;D&#39;],[&#39;aa,&#39;cc&#39;,&#39;ff&#39;])]

and

df.loc[&#39;2023-01-05&#39;:,([A,C],[aa,dd])]

work, but

df.loc[&#39;2023-01-05&#39;,([A:],[aa,dd])]

and

df.loc[&#39;2023-01-05&#39;,(A:,[aa,dd])]

give invalid syntax. Can anyone explain, or maybe point me towards a tutorial that will help with the level definitions and round/square brackets and colons?

Thanks.

答案1

得分: 3

要提取level列名称的列表，您可以使用get_level_values：

df.columns.get_level_values(0)
# Index(['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D'], dtype='object')
df.columns.get_level_values(1)
# Index(['aa', 'bb', 'cc', 'aa', 'bb', 'cc', 'aa', 'bb', 'cc', 'aa', 'bb', 'cc'], dtype='object')
df.columns.get_level_values(0).unique()
# Index(['A', 'B', 'C', 'D'], dtype='object')
df.columns.get_level_values(1).unique()
# Index(['aa', 'bb', 'cc'], dtype='object')

对于3和4，使用pd.IndexSlice会很方便：

# 获取特定level one索引的所有level zero数据
df.loc['2023-01-05', pd.IndexSlice[:, 'aa']]
# A  aa     3
# B  aa     3
# C  aa     4
# D  aa    10
# Name: 2023-01-05, dtype: int64
# 获取特定level zero索引的所有level one数据
df.loc['2023-01-05', pd.IndexSlice['A', :]]
# A  aa     3
#    bb    21
#    cc    10
# Name: 2023-01-05, dtype: int64

英文:

To pull a list of level column names, you can use get_level_values:

df.columns.get_level_values(0)
#Index([&#39;A&#39;, &#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;B&#39;, &#39;B&#39;, &#39;C&#39;, &#39;C&#39;, &#39;C&#39;, &#39;D&#39;, &#39;D&#39;, &#39;D&#39;], dtype=&#39;object&#39;)
df.columns.get_level_values(1)
#Index([&#39;aa&#39;, &#39;bb&#39;, &#39;cc&#39;, &#39;aa&#39;, &#39;bb&#39;, &#39;cc&#39;, &#39;aa&#39;, &#39;bb&#39;, &#39;cc&#39;, &#39;aa&#39;, &#39;bb&#39;, &#39;cc&#39;], dtype=&#39;object&#39;)
df.columns.get_level_values(0).unique()
#Index([&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;], dtype=&#39;object&#39;)
df.columns.get_level_values(1).unique()
#Index([&#39;aa&#39;, &#39;bb&#39;, &#39;cc&#39;], dtype=&#39;object&#39;)

For 3 and 4, pd.IndexSlice would be convenient to use:

# all level zero data for a specific level one index
df.loc[&#39;2023-01-05&#39;, pd.IndexSlice[:, &#39;aa&#39;]]
#A  aa     3
#B  aa     3
#C  aa     4
#D  aa    10
#Name: 2023-01-05, dtype: int64
# all level one data for a specific level zero index
df.loc[&#39;2023-01-05&#39;, pd.IndexSlice[&#39;A&#39;, :]]
#A  aa     3
#   bb    21
#   cc    10
#Name: 2023-01-05, dtype: int64

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何对多级索引（列）的数据框进行排序？

问题

答案1

Python：如何从具有相同维度的多个数据框创建唯一的数据框

有没有一种方法可以将字典按照数字顺序排序，包括负数？

需要XPath来选择下拉菜单中的第一个/随机元素

将DataFrame写入Excel文件，其中列表中的项目被放入单独的单元格。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。