2023年6月19日 20:47:17go评论115阅读模式

英文:

Dropped columns reappear in columns.level

问题

我有一个带有多级索引的DataFrame。

当我删除一列（例如，包含NaN的列）时，当我调用df.columns.levels[1]时，仍然会出现这个列名。

最小工作示例：

# 创建DataFrame
midx = pd.MultiIndex.from_tuples([('A', 'aa'), ('A', 'bb'), ('B', 'cc'), ('B', 'dd')])
mydf = pd.DataFrame(np.random.randn(5, 4), columns=midx)
mydf.loc[1, ('B', 'cc')] = np.nan
print(mydf)
>>        A                   B          
         aa        bb        cc        dd
0 -0.565250 -1.267290 -1.811422 -0.242648
1  0.138827  0.182022       NaN -0.286807
2  0.037163 -1.867622  1.259539 -0.485333
3  1.283082  1.030154  0.678748 -0.200731
4 -0.405116 -0.963670 -0.405438 -1.695403
# 删除带有NaN的列
mydf.dropna(how='any', axis=1, inplace=True)
print(mydf)
>>        A                   B
         aa        bb        dd
0 -0.565250 -1.267290 -0.242648
1  0.138827  0.182022 -0.286807
2  0.037163 -1.867622 -0.485333
3  1.283082  1.030154 -0.200731
4 -0.405116 -0.963670 -1.695403
mydf.columns.levels[1]
>> Index(['aa', 'bb', 'cc', 'dd'], dtype='object')

我尝试过的替代方法，最终结果都相同：

new_df = mydf.dropna(how='any', axis=1)
new_df = mydf.dropna(how='any', axis=1).copy()

我需要访问第1级中存在的列名列表。
我找到了一个可行的解决方法，但我需要了解为什么上面的代码不按预期工作。

英文:

I have a DataFrame with MultiIndex.

When I drop a column (e.g., containing a NaN) this column name still appears, when I call df.columns.levels[1].

Minimal working example:

# Create DataFrame
midx = pd.MultiIndex.from_tuples([(&#39;A&#39;,&#39;aa&#39;),(&#39;A&#39;,&#39;bb&#39;),(&#39;B&#39;,&#39;cc&#39;),(&#39;B&#39;,&#39;dd&#39;)])
mydf = pd.DataFrame(np.random.randn(5,4), columns=midx)
mydf.loc[1,(&#39;B&#39;,&#39;cc&#39;)] = np.nan
print(mydf)
&gt;&gt;        A                   B          
         aa        bb        cc        dd
0 -0.565250 -1.267290 -1.811422 -0.242648
1  0.138827  0.182022       NaN -0.286807
2  0.037163 -1.867622  1.259539 -0.485333
3  1.283082  1.030154  0.678748 -0.200731
4 -0.405116 -0.963670 -0.405438 -1.695403
# Drop column with NaN
mydf.dropna(how=&#39;any&#39;, axis=1, inplace=True)
print(mydf)
&gt;&gt;        A                   B
         aa        bb        dd
0 -0.565250 -1.267290 -0.242648
1  0.138827  0.182022 -0.286807
2  0.037163 -1.867622 -0.485333
3  1.283082  1.030154 -0.200731
4 -0.405116 -0.963670 -1.695403
mydf.columns.levels[1]
&gt;&gt; Index([&#39;aa&#39;, &#39;bb&#39;, &#39;cc&#39;, &#39;dd&#39;], dtype=&#39;object&#39;)

Alternatives I've tried, all ending with the same results:

new_df = mydf.dropna(how=&#39;any&#39;, axis=1)
new_df = mydf.dropna(how=&#39;any&#39;, axis=1).copy()

I need to access the list of present column names on level 1.
I have found a doable work-around, but I need to understand why this code above is not working as intended.

答案1

得分: 2

不要被MultiIndex（由单个索引组合而成）和每个Index级别所混淆。MultiIndex代表了组成它的各个单独索引的可见子集（最好是笛卡尔积）。

# Index，级别 0
>>> mydf.columns.levels[0]
# Index，级别 1
>>> mydf.columns.levels[1]
Index(['aa', 'bb', 'cc', 'dd'], dtype='object')
# Values，级别 0
>>> mydf.columns.get_level_values(0)
Index(['A', 'A', 'B'], dtype='object')
# Values，级别 1
>>> mydf.columns.get_level_values(1)
Index(['aa', 'bb', 'dd'], dtype='object')
# 笛卡尔积 / 密集多级索引
>>> pd.MultiIndex.from_product([mydf.columns.levels[0], mydf.columns.levels[1]])
MultiIndex([('A', 'aa'),
            ('A', 'bb'),
            ('A', 'cc'),
            ('A', 'dd'),
            ('B', 'aa'),
            ('B', 'bb'),
            ('B', 'cc'),
            ('B', 'dd')],
           )

因此，如果您有一个不再被引用的元素，就像@ScottBoston所说，您可以使用remove_unused_levels。

要仅使用已使用的级别重构MultiIndex，可以使用remove_unused_levels()方法。

>>> mydf.columns.remove_unused_levels().levels
FrozenList([['A', 'B'], ['aa', 'bb', 'dd']])
#      level 0 --^     level 1 --^

更多关于MultiIndex中定义级别/高级索引的信息。

英文:

Don't be confused by the MultiIndex (a combination of single indexes) and each Index level. The MultiIndex represents a visible subset (at best the cartesian product) of the individual indexes that compose it.

# Index, level 0
&gt;&gt;&gt; mydf.columns.levels[0]
# Index, level 1
&gt;&gt;&gt; mydf.columns.levels[1]
Index([&#39;aa&#39;, &#39;bb&#39;, &#39;cc&#39;, &#39;dd&#39;], dtype=&#39;object&#39;)
# Values, level 0
&gt;&gt;&gt; mydf.columns.get_level_values(0)
Index([&#39;A&#39;, &#39;A&#39;, &#39;B&#39;], dtype=&#39;object&#39;)
# Values, level 1
&gt;&gt;&gt; mydf.columns.get_level_values(1)
Index([&#39;aa&#39;, &#39;bb&#39;, &#39;dd&#39;], dtype=&#39;object&#39;)
# Cartesian product / dense multi-index
&gt;&gt;&gt; pd.MultiIndex.from_product([mydf.columns.levels[0], mydf.columns.levels[1]])
MultiIndex([(&#39;A&#39;, &#39;aa&#39;),
            (&#39;A&#39;, &#39;bb&#39;),
            (&#39;A&#39;, &#39;cc&#39;),
            (&#39;A&#39;, &#39;dd&#39;),
            (&#39;B&#39;, &#39;aa&#39;),
            (&#39;B&#39;, &#39;bb&#39;),
            (&#39;B&#39;, &#39;cc&#39;),
            (&#39;B&#39;, &#39;dd&#39;)],
           )

So if you have an element that is no longer referenced, as @ScottBoston said, you can use remove_unused_levels.

> To reconstruct the MultiIndex with only the used levels, the remove_unused_levels() method may be used.

&gt;&gt;&gt; mydf.columns.remove_unused_levels().levels
FrozenList([[&#39;A&#39;, &#39;B&#39;], [&#39;aa&#39;, &#39;bb&#39;, &#39;dd&#39;]])
#      level 0 --^     level 1 --^

More on Defined levels in MultiIndex / advanced indexing

答案2

得分: 1

使用 pd.MultiIndex.remove_unused_levels：

mydf.columns.levels[1]
#Index(['aa', 'bb', 'cc', 'dd'], dtype='object')
mydf.columns = mydf.columns.remove_unused_levels()
mydf.columns.levels[1]
#Index(['aa', 'bb', 'dd'], dtype='object')

英文:

Use pd.MultiIndex.remove_unused_levels:

mydf.columns.levels[1]
#Index([&#39;aa&#39;, &#39;bb&#39;, &#39;cc&#39;, &#39;dd&#39;], dtype=&#39;object&#39;)
mydf.columns = mydf.columns.remove_unused_levels()
mydf.columns.levels[1]
#Index([&#39;aa&#39;, &#39;bb&#39;, &#39;dd&#39;], dtype=&#39;object&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

删除的列重新出现在列级别中

问题

答案1

答案2

动态数据框和Dash回调中的条件样式化

从一个Cython生成的.so文件中在C++代码中调用一个函数。

制作一个表格，使用字典。

在R数据框中反转非NA值的顺序。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。