向多级列数据框添加条件列

huangapple go评论61阅读模式
英文:

Adding conditional columns to multi-level column dataframe

问题

# 以下是更高效的代码,避免了循环:
conditions = ((df1[('A1', 'b_1')] > 30) | (df1[('A1', 'a_1')] == 'c'),
              (df1[('B1', 'b_1')] > 30) | (df1[('B1', 'a_1')] == 'c'),
              (df1[('C1', 'b_1')] > 30) | (df1[('C1', 'a_1')] == 'c'))

df1.loc[:, pd.IndexSlice[:, 'e_1']] = np.where(np.column_stack(conditions), 1, 0)

df1 = df1.reindex(columns=['A1', 'B1', 'C1'], level=0)
英文:

I have a multi-level column dataframe where I need to add a column (level_2) based on conditions in other columns. The added column will be applied to all level_1 groups. For example, the conditional column will be addded to A1, B1, C1, etc. This dataframe is just an example of the larger dataframe I'm working on. Level_1 is dynamic and can change, for example, adding Z1 or taking out B1.

import pandas as pd
import numpy as np

level_1 = ['A1', 'A1', 'A1', 'B1', 'B1', 'B1', 'C1', 'C1', 'C1']
level_2 = ['a_1', 'b_1', 'c_1', 'a_1', 'b_1', 'c_1', 'a_1', 'b_1', 'c_1']
data = [['a', 23, 'h', 'o', 45, 'v', 'a3', 1, 'b1'], ['b', 34, 'i', 'p', 3, 'w', 'a4', 32, 'b2'], ['c', 5, 'j', 'q', 7, 'x', 'a5', 6, 'b3'], ['d', 2, 'k', 'r', 5, 'y', 'a6', 76, 'b4'], ['e', 78, 'l', 's', 65, 'z', 'a7', 9, 'b5'], ['f', 98, 'm', 't', 23, 'a1',  'a8', 14, 'b6'], ['g', 3, 'n', 'u', 1, 'a2', 'a9', 45, 'b7']]
columns = pd.MultiIndex.from_tuples(list(zip(level_1, level_2)))
df1 = pd.DataFrame(data, columns=columns)
date = ['1/1/2023','1/2/2023','1/3/2023','1/4/2023','1/5/2023','1/6/2023','1/7/2023']

df1.insert(0, 'date', date)

df1.set_index('date', inplace=True)

I've tried the below code, which works, but I am wondering if there is a more efficient way to do this, without looping? Thank you.

for column_name in df1.columns.get_level_values(0).unique():
    df1.loc[(df1[column_name, 'b_1'] > 30) | (df1[column_name, 'a_1'] == 'c'), (column_name,'e_1')] = 1

df1 = df1.reindex(columns=['A1','B1','C1'], level=0)

答案1

得分: 2

使用重新整形的方法有一个更简单的方式。堆叠(level=0)列值,然后根据所需条件分配新列 e_1,最后使用unstack将其重新整形为原始形式。

s = df1.stack(level=0)
s.loc
展开收缩
.eq('e') & s['b_1'].gt(30), 'e_1'] = 1
s = s.unstack().swaplevel(axis=1).sort_index(axis=1)

              A1               B1              C1            
             a_1 b_1 c_1  e_1 a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1
    date                                                     
    1/1/2023   a  23   h  NaN   o  45   v NaN  a3   1  b1 NaN
    1/2/2023   b  34   i  NaN   p   3   w NaN  a4  32  b2 NaN
    1/3/2023   c   5   j  NaN   q   7   x NaN  a5   6  b3 NaN
    1/4/2023   d   2   k  NaN   r   5   y NaN  a6  76  b4 NaN
    1/5/2023   e  78   l  1.0   s  65   z NaN  a7   9  b5 NaN
    1/6/2023   f  98   m  NaN   t  23  a1 NaN  a8  14  b6 NaN
    1/7/2023   g   3   n  NaN   u   1  a2 NaN  a9  45  b7 NaN
英文:

There is a simpler way with reshaping. Stack the level=0 column values then assign the new column e_1 based on the required condition, finally unstack to reshape back to original form

s = df1.stack(level=0)
s.loc
展开收缩
.eq('e') & s['b_1'].gt(30), 'e_1'] = 1 s = s.unstack().swaplevel(axis=1).sort_index(axis=1)

          A1               B1              C1            
         a_1 b_1 c_1  e_1 a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1
date                                                     
1/1/2023   a  23   h  NaN   o  45   v NaN  a3   1  b1 NaN
1/2/2023   b  34   i  NaN   p   3   w NaN  a4  32  b2 NaN
1/3/2023   c   5   j  NaN   q   7   x NaN  a5   6  b3 NaN
1/4/2023   d   2   k  NaN   r   5   y NaN  a6  76  b4 NaN
1/5/2023   e  78   l  1.0   s  65   z NaN  a7   9  b5 NaN
1/6/2023   f  98   m  NaN   t  23  a1 NaN  a8  14  b6 NaN
1/7/2023   g   3   n  NaN   u   1  a2 NaN  a9  45  b7 NaN

答案2

得分: 1

也许你可以使用 .xs

x = df1.xs('b_1', axis=1, level=1) > 30
y = df1.xs('a_1', axis=1, level=1).eq('c')
z = (x | y).astype(int)
z.columns = pd.MultiIndex.from_product([z.columns, ['e_1']])

df1 = pd.concat([df1, z], axis=1).reindex(columns=['A1', 'B1', 'C1'], level=0)
print(df1)

打印输出:

          A1              B1              C1            
         a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1
date                                                    
1/1/2023   a  23   h   0   o  45   v   1  a3   1  b1   0
1/2/2023   b  34   i   1   p   3   w   0  a4  32  b2   1
1/3/2023   c   5   j   1   q   7   x   0  a5   6  b3   0
1/4/2023   d   2   k   0   r   5   y   0  a6  76  b4   1
1/5/2023   e  78   l   1   s  65   z   1  a7   9  b5   0
1/6/2023   f  98   m   1   t  23  a1   0  a8  14  b6   0
1/7/2023   g   3   n   0   u   1  a2   0  a9  45  b7   1
英文:

Maybe you can use .xs?

x = df1.xs('b_1', axis=1, level=1) > 30
y = df1.xs('a_1', axis=1, level=1).eq('c')
z = (x | y).astype(int)
z.columns = pd.MultiIndex.from_product([z.columns, ['e_1']])

df1 = pd.concat([df1, z], axis=1).reindex(columns=['A1','B1','C1'], level=0)
print(df1)

Prints:

          A1              B1              C1            
         a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1
date                                                    
1/1/2023   a  23   h   0   o  45   v   1  a3   1  b1   0
1/2/2023   b  34   i   1   p   3   w   0  a4  32  b2   1
1/3/2023   c   5   j   1   q   7   x   0  a5   6  b3   0
1/4/2023   d   2   k   0   r   5   y   0  a6  76  b4   1
1/5/2023   e  78   l   1   s  65   z   1  a7   9  b5   0
1/6/2023   f  98   m   1   t  23  a1   0  a8  14  b6   0
1/7/2023   g   3   n   0   u   1  a2   0  a9  45  b7   1

huangapple
  • 本文由 发表于 2023年6月16日 01:50:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76484301.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定