2023年7月3日 20:04:21go评论126阅读模式

英文:

In Pandas Multiindex, how do you do an indexslice without knowing the position of the level?

问题

我有一个使用pandas数据框的程序，使用了2级多索引（日期和数据）：

Date_Time   Data    
date1       a
            b
            c
date2       a
            b
            c
date3       a
            b
            c
...

因此，所有函数在需要使用/修改数据框内容时都使用pandas的IndexSlice，例如：

df.loc[pd.IndexSlice[:, 'a'], :]

这很有效，易于阅读，简洁，使许多一行函数成为可能。

但是，我目前需要根据某些属性来区分数据，以避免在重新采样时合并它们，我是通过在必要时添加索引的第三级来实现的：

Date_Time   Property     Data    
date1       1            a
            1            b
            1            c
date2       2            a
            2            b
            2            c
date3       1            a
            1            b
            1            c
...

目标是能够对时间进行重新分组并得到这个多级索引：

Date_Time   Property     Data    
Period1     1            a
            1            b
            1            c
            2            a
            2            b
            2            c
Period2     1            a
            1            b
            1            c
...

因此，问题是df.loc[pd.IndexSlice[:, 'a'], :]不再起作用，我必须将其更改为

df.loc[pd.IndexSlice[:, :, 'a'], :]

但这意味着每次在带有额外列的数据框上使用它时都必须更改代码本身。

有没有办法以灵活的方式定义切片？？

我希望能够使用变量来定义切片，就像在列表理解中一样，这样它可以防止未来对多级索引级别的长度和顺序进行更改。但是据我所查，这似乎是不可能的，那么我该怎么办？

我可以在每个函数的开头使用try-except块来定义切片，在这个块内部确保级别和级别值已存在；或者将属性级别移到右边，这样我仍然可以使用pd.IndexSlice[:, 'a']（但将来可能会再次遇到这个问题）

编辑：以下是生成使用此类索引的数据框的一些代码：

iter1=[["03/07/2023 07:40:00", "03/07/2023 07:50:00"], ["S=0.1"],["Probe1","Probe2","Probe3"]]
iter2=[["03/07/2023 07:45:00", "03/07/2023 07:55:00"], ["S=0.2"],["Probe1","Probe2","Probe3"]]
idx1=pd.MultiIndex.from_product(iter1, names=["Date_Time", "Property",'Data'])
idx2=pd.MultiIndex.from_product(iter2, names=["Date_Time", "Property",'Data'])
df_aux1=pd.DataFrame(np.random.randn(6, 3), index=idx1, columns=['X','Y','Error'])
df_aux2=pd.DataFrame(np.random.randn(6, 3), index=idx2, columns=['X','Y','Error'])
df=pd.concat([df_aux1,df_aux2]).sort_index(level='Date_Time')

这些是您提供的示例代码。

英文:

I have a program that works with pandas dataframes, using a multiindex of 2 levels (dates and data) such as:

Date_Time   Data    
date1       a
            b
            c
date2       a
            b
            c
date3       a
            b
            c
...

So all the functions use the pandas IndexSlice when having to use/modify the contents of the df, like:

df.loc[pd.IndexSlice[:,'a'],:]

This worked great, easy to read, short and efficient, and made possible a lot of one-lines functions.

However, I am currently having to differenciate the data based on some properties in order to not having them merge when doing a resample, and I am doing it by adding a third level to the index when necessary:

Date_Time   Property     Data    
date1       1            a
            1            b
            1            c
date2       2            a
            2            b
            2            c
date3       1            a
            1            b
            1            c
...

The goal is to be able to do a groupby with a resample over time and end up with this multiindex:

Date_Time   Property     Data    
Period1     1            a
            1            b
            1            c
            2            a
            2            b
            2            c
Period2     1            a
            1            b
            1            c
...

So, the problem is that df.loc[pd.IndexSlice[:,'a'],:] no longer works, I would have to change it to

df.loc[pd.IndexSlice[:,:,&#39;a&#39;],:]

But that means changing the code itself everytime I use that dataframe with the extra column.

Isn't there any way to define the slice in a flexible way??

I would like to define the slice using variables, like in list comprehension, so it is future protected against more changes in the length and order of the multiindex levels. But as far as I checked, that is not possible, so what should I do??

I could define the slice using try-except blocks at the beginning of each function, inside the block that already makes sure that level and level_value exists; or move the property level to the right so I could still use pd.IndexSlice[:,'a'] (but in the future I might end up with this problem again)

EDIT: Here is some code to generate a dataframe that uses this kind of index:

iter1=[[&quot;03/07/2023 07:40:00&quot;, &quot;03/07/2023 07:50:00&quot;], [&quot;S=0.1&quot;],[&quot;Probe1&quot;,&quot;Probe2&quot;,&quot;Probe3&quot;]]
iter2=[[&quot;03/07/2023 07:45:00&quot;, &quot;03/07/2023 07:55:00&quot;], [&quot;S=0.2&quot;],[&quot;Probe1&quot;,&quot;Probe2&quot;,&quot;Probe3&quot;]]
idx1=pd.MultiIndex.from_product(iter1, names=[&quot;Date_Time&quot;, &quot;Property&quot;,&#39;Data&#39;])
idx2=pd.MultiIndex.from_product(iter2, names=[&quot;Date_Time&quot;, &quot;Property&quot;,&#39;Data&#39;])
df_aux1=pd.DataFrame(np.random.randn(6, 3), index=idx1, columns=[&#39;X&#39;,&#39;Y&#39;,&#39;Error&#39;])
df_aux2=pd.DataFrame(np.random.randn(6, 3), index=idx2, columns=[&#39;X&#39;,&#39;Y&#39;,&#39;Error&#39;])
df=pd.concat([df_aux1,df_aux2]).sort_index(level=&#39;Date_Time&#39;)

答案1

得分: 1

以下是翻译好的部分：

The exact data and logic is unclear, but since you have named levels you could use Index.get_level_values and boolean indexing:

df.loc[df.index.get_level_values('Data') == 'a']

Or by position:

df.loc[df.index.get_level_values(-1) == 'a']

英文:

The exact data and logic is unclear, but since you have named levels you could use Index.get_level_values and boolean indexing:

df.loc[df.index.get_level_values(&#39;Data&#39;) == &#39;a&#39;]

Or by position:

df.loc[df.index.get_level_values(-1) == &#39;a&#39;]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Pandas Multiindex中，如何在不知道级别位置的情况下执行索引切片？

问题

答案1

不能从hdfscli导入Python hdfs客户端或配置模块。

Auto-switching Python Virtual Environments in Visual Studio Code per Directory within a Workspace.

如何提高比对一列来自MongoDB的已知ID列表和另一列ID列表的速度？

重新按成对递归地排列数据。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。