2023年6月16日 01:50:56go评论156阅读模式

英文:

Adding conditional columns to multi-level column dataframe

问题

# 以下是更高效的代码，避免了循环：
conditions = ((df1[('A1', 'b_1')] > 30) | (df1[('A1', 'a_1')] == 'c'),
              (df1[('B1', 'b_1')] > 30) | (df1[('B1', 'a_1')] == 'c'),
              (df1[('C1', 'b_1')] > 30) | (df1[('C1', 'a_1')] == 'c'))

df1.loc[:, pd.IndexSlice[:, 'e_1']] = np.where(np.column_stack(conditions), 1, 0)

df1 = df1.reindex(columns=['A1', 'B1', 'C1'], level=0)

英文:

I have a multi-level column dataframe where I need to add a column (level_2) based on conditions in other columns. The added column will be applied to all level_1 groups. For example, the conditional column will be addded to A1, B1, C1, etc. This dataframe is just an example of the larger dataframe I'm working on. Level_1 is dynamic and can change, for example, adding Z1 or taking out B1.

import pandas as pd
import numpy as np

level_1 = [&#39;A1&#39;, &#39;A1&#39;, &#39;A1&#39;, &#39;B1&#39;, &#39;B1&#39;, &#39;B1&#39;, &#39;C1&#39;, &#39;C1&#39;, &#39;C1&#39;]
level_2 = [&#39;a_1&#39;, &#39;b_1&#39;, &#39;c_1&#39;, &#39;a_1&#39;, &#39;b_1&#39;, &#39;c_1&#39;, &#39;a_1&#39;, &#39;b_1&#39;, &#39;c_1&#39;]
data = [[&#39;a&#39;, 23, &#39;h&#39;, &#39;o&#39;, 45, &#39;v&#39;, &#39;a3&#39;, 1, &#39;b1&#39;], [&#39;b&#39;, 34, &#39;i&#39;, &#39;p&#39;, 3, &#39;w&#39;, &#39;a4&#39;, 32, &#39;b2&#39;], [&#39;c&#39;, 5, &#39;j&#39;, &#39;q&#39;, 7, &#39;x&#39;, &#39;a5&#39;, 6, &#39;b3&#39;], [&#39;d&#39;, 2, &#39;k&#39;, &#39;r&#39;, 5, &#39;y&#39;, &#39;a6&#39;, 76, &#39;b4&#39;], [&#39;e&#39;, 78, &#39;l&#39;, &#39;s&#39;, 65, &#39;z&#39;, &#39;a7&#39;, 9, &#39;b5&#39;], [&#39;f&#39;, 98, &#39;m&#39;, &#39;t&#39;, 23, &#39;a1&#39;,  &#39;a8&#39;, 14, &#39;b6&#39;], [&#39;g&#39;, 3, &#39;n&#39;, &#39;u&#39;, 1, &#39;a2&#39;, &#39;a9&#39;, 45, &#39;b7&#39;]]
columns = pd.MultiIndex.from_tuples(list(zip(level_1, level_2)))
df1 = pd.DataFrame(data, columns=columns)
date = [&#39;1/1/2023&#39;,&#39;1/2/2023&#39;,&#39;1/3/2023&#39;,&#39;1/4/2023&#39;,&#39;1/5/2023&#39;,&#39;1/6/2023&#39;,&#39;1/7/2023&#39;]

df1.insert(0, &#39;date&#39;, date)

df1.set_index(&#39;date&#39;, inplace=True)

I've tried the below code, which works, but I am wondering if there is a more efficient way to do this, without looping? Thank you.

for column_name in df1.columns.get_level_values(0).unique():
    df1.loc[(df1[column_name, &#39;b_1&#39;] &gt; 30) | (df1[column_name, &#39;a_1&#39;] == &#39;c&#39;), (column_name,&#39;e_1&#39;)] = 1

df1 = df1.reindex(columns=[&#39;A1&#39;,&#39;B1&#39;,&#39;C1&#39;], level=0)

答案1

得分: 2

使用重新整形的方法有一个更简单的方式。堆叠(level=0)列值，然后根据所需条件分配新列 e_1，最后使用unstack将其重新整形为原始形式。

s = df1.stack(level=0)
s.loc展开收缩
.eq('e') & s['b_1'].gt(30), 'e_1'] = 1
s = s.unstack().swaplevel(axis=1).sort_index(axis=1)

              A1               B1              C1            
             a_1 b_1 c_1  e_1 a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1
    date                                                     
    1/1/2023   a  23   h  NaN   o  45   v NaN  a3   1  b1 NaN
    1/2/2023   b  34   i  NaN   p   3   w NaN  a4  32  b2 NaN
    1/3/2023   c   5   j  NaN   q   7   x NaN  a5   6  b3 NaN
    1/4/2023   d   2   k  NaN   r   5   y NaN  a6  76  b4 NaN
    1/5/2023   e  78   l  1.0   s  65   z NaN  a7   9  b5 NaN
    1/6/2023   f  98   m  NaN   t  23  a1 NaN  a8  14  b6 NaN
    1/7/2023   g   3   n  NaN   u   1  a2 NaN  a9  45  b7 NaN

英文:

There is a simpler way with reshaping. Stack the level=0 column values then assign the new column e_1 based on the required condition, finally unstack to reshape back to original form

s = df1.stack(level=0)
s.loc展开收缩.eq(&#39;e&#39;) &amp; s[&#39;b_1&#39;].gt(30), &#39;e_1&#39;] = 1
s = s.unstack().swaplevel(axis=1).sort_index(axis=1)

          A1               B1              C1            
         a_1 b_1 c_1  e_1 a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1
date                                                     
1/1/2023   a  23   h  NaN   o  45   v NaN  a3   1  b1 NaN
1/2/2023   b  34   i  NaN   p   3   w NaN  a4  32  b2 NaN
1/3/2023   c   5   j  NaN   q   7   x NaN  a5   6  b3 NaN
1/4/2023   d   2   k  NaN   r   5   y NaN  a6  76  b4 NaN
1/5/2023   e  78   l  1.0   s  65   z NaN  a7   9  b5 NaN
1/6/2023   f  98   m  NaN   t  23  a1 NaN  a8  14  b6 NaN
1/7/2023   g   3   n  NaN   u   1  a2 NaN  a9  45  b7 NaN

答案2

得分: 1

也许你可以使用 .xs？

x = df1.xs('b_1', axis=1, level=1) > 30
y = df1.xs('a_1', axis=1, level=1).eq('c')
z = (x | y).astype(int)
z.columns = pd.MultiIndex.from_product([z.columns, ['e_1']])

df1 = pd.concat([df1, z], axis=1).reindex(columns=['A1', 'B1', 'C1'], level=0)
print(df1)

打印输出：

          A1              B1              C1            
         a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1
date                                                    
1/1/2023   a  23   h   0   o  45   v   1  a3   1  b1   0
1/2/2023   b  34   i   1   p   3   w   0  a4  32  b2   1
1/3/2023   c   5   j   1   q   7   x   0  a5   6  b3   0
1/4/2023   d   2   k   0   r   5   y   0  a6  76  b4   1
1/5/2023   e  78   l   1   s  65   z   1  a7   9  b5   0
1/6/2023   f  98   m   1   t  23  a1   0  a8  14  b6   0
1/7/2023   g   3   n   0   u   1  a2   0  a9  45  b7   1

英文:

Maybe you can use .xs?

x = df1.xs(&#39;b_1&#39;, axis=1, level=1) &gt; 30
y = df1.xs(&#39;a_1&#39;, axis=1, level=1).eq(&#39;c&#39;)
z = (x | y).astype(int)
z.columns = pd.MultiIndex.from_product([z.columns, [&#39;e_1&#39;]])

df1 = pd.concat([df1, z], axis=1).reindex(columns=[&#39;A1&#39;,&#39;B1&#39;,&#39;C1&#39;], level=0)
print(df1)

Prints:

          A1              B1              C1            
         a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1
date                                                    
1/1/2023   a  23   h   0   o  45   v   1  a3   1  b1   0
1/2/2023   b  34   i   1   p   3   w   0  a4  32  b2   1
1/3/2023   c   5   j   1   q   7   x   0  a5   6  b3   0
1/4/2023   d   2   k   0   r   5   y   0  a6  76  b4   1
1/5/2023   e  78   l   1   s  65   z   1  a7   9  b5   0
1/6/2023   f  98   m   1   t  23  a1   0  a8  14  b6   0
1/7/2023   g   3   n   0   u   1  a2   0  a9  45  b7   1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

向多级列数据框添加条件列

问题

答案1

答案2

Mounting Flask app in FastAPI raises a 404 Not Found error response

Key Phrase Search in String (在字符串中搜索关键短语)

TypeError: 数据类型 ‘>’ 无法使用 numpy 中的 dtype 理解

在Golang中以压缩的二进制格式存储矩阵。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论