英文:
Adding conditional columns to multi-level column dataframe
问题
# 以下是更高效的代码,避免了循环:
conditions = ((df1[('A1', 'b_1')] > 30) | (df1[('A1', 'a_1')] == 'c'),
(df1[('B1', 'b_1')] > 30) | (df1[('B1', 'a_1')] == 'c'),
(df1[('C1', 'b_1')] > 30) | (df1[('C1', 'a_1')] == 'c'))
df1.loc[:, pd.IndexSlice[:, 'e_1']] = np.where(np.column_stack(conditions), 1, 0)
df1 = df1.reindex(columns=['A1', 'B1', 'C1'], level=0)
英文:
I have a multi-level column dataframe where I need to add a column (level_2) based on conditions in other columns. The added column will be applied to all level_1 groups. For example, the conditional column will be addded to A1, B1, C1, etc. This dataframe is just an example of the larger dataframe I'm working on. Level_1 is dynamic and can change, for example, adding Z1 or taking out B1.
import pandas as pd
import numpy as np
level_1 = ['A1', 'A1', 'A1', 'B1', 'B1', 'B1', 'C1', 'C1', 'C1']
level_2 = ['a_1', 'b_1', 'c_1', 'a_1', 'b_1', 'c_1', 'a_1', 'b_1', 'c_1']
data = [['a', 23, 'h', 'o', 45, 'v', 'a3', 1, 'b1'], ['b', 34, 'i', 'p', 3, 'w', 'a4', 32, 'b2'], ['c', 5, 'j', 'q', 7, 'x', 'a5', 6, 'b3'], ['d', 2, 'k', 'r', 5, 'y', 'a6', 76, 'b4'], ['e', 78, 'l', 's', 65, 'z', 'a7', 9, 'b5'], ['f', 98, 'm', 't', 23, 'a1', 'a8', 14, 'b6'], ['g', 3, 'n', 'u', 1, 'a2', 'a9', 45, 'b7']]
columns = pd.MultiIndex.from_tuples(list(zip(level_1, level_2)))
df1 = pd.DataFrame(data, columns=columns)
date = ['1/1/2023','1/2/2023','1/3/2023','1/4/2023','1/5/2023','1/6/2023','1/7/2023']
df1.insert(0, 'date', date)
df1.set_index('date', inplace=True)
I've tried the below code, which works, but I am wondering if there is a more efficient way to do this, without looping? Thank you.
for column_name in df1.columns.get_level_values(0).unique():
df1.loc[(df1[column_name, 'b_1'] > 30) | (df1[column_name, 'a_1'] == 'c'), (column_name,'e_1')] = 1
df1 = df1.reindex(columns=['A1','B1','C1'], level=0)
答案1
得分: 2
使用重新整形的方法有一个更简单的方式。堆叠(level=0)列值,然后根据所需条件分配新列 e_1
,最后使用unstack将其重新整形为原始形式。
s = df1.stack(level=0)
s.loc展开收缩.eq('e') & s['b_1'].gt(30), 'e_1'] = 1
s = s.unstack().swaplevel(axis=1).sort_index(axis=1)
A1 B1 C1
a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1
date
1/1/2023 a 23 h NaN o 45 v NaN a3 1 b1 NaN
1/2/2023 b 34 i NaN p 3 w NaN a4 32 b2 NaN
1/3/2023 c 5 j NaN q 7 x NaN a5 6 b3 NaN
1/4/2023 d 2 k NaN r 5 y NaN a6 76 b4 NaN
1/5/2023 e 78 l 1.0 s 65 z NaN a7 9 b5 NaN
1/6/2023 f 98 m NaN t 23 a1 NaN a8 14 b6 NaN
1/7/2023 g 3 n NaN u 1 a2 NaN a9 45 b7 NaN
英文:
There is a simpler way with reshaping. Stack the level=0 column values then assign the new column e_1
based on the required condition, finally unstack to reshape back to original form
s = df1.stack(level=0)
s.loc展开收缩.eq('e') & s['b_1'].gt(30), 'e_1'] = 1
s = s.unstack().swaplevel(axis=1).sort_index(axis=1)
A1 B1 C1
a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1
date
1/1/2023 a 23 h NaN o 45 v NaN a3 1 b1 NaN
1/2/2023 b 34 i NaN p 3 w NaN a4 32 b2 NaN
1/3/2023 c 5 j NaN q 7 x NaN a5 6 b3 NaN
1/4/2023 d 2 k NaN r 5 y NaN a6 76 b4 NaN
1/5/2023 e 78 l 1.0 s 65 z NaN a7 9 b5 NaN
1/6/2023 f 98 m NaN t 23 a1 NaN a8 14 b6 NaN
1/7/2023 g 3 n NaN u 1 a2 NaN a9 45 b7 NaN
答案2
得分: 1
也许你可以使用 .xs
?
x = df1.xs('b_1', axis=1, level=1) > 30
y = df1.xs('a_1', axis=1, level=1).eq('c')
z = (x | y).astype(int)
z.columns = pd.MultiIndex.from_product([z.columns, ['e_1']])
df1 = pd.concat([df1, z], axis=1).reindex(columns=['A1', 'B1', 'C1'], level=0)
print(df1)
打印输出:
A1 B1 C1
a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1
date
1/1/2023 a 23 h 0 o 45 v 1 a3 1 b1 0
1/2/2023 b 34 i 1 p 3 w 0 a4 32 b2 1
1/3/2023 c 5 j 1 q 7 x 0 a5 6 b3 0
1/4/2023 d 2 k 0 r 5 y 0 a6 76 b4 1
1/5/2023 e 78 l 1 s 65 z 1 a7 9 b5 0
1/6/2023 f 98 m 1 t 23 a1 0 a8 14 b6 0
1/7/2023 g 3 n 0 u 1 a2 0 a9 45 b7 1
英文:
Maybe you can use .xs
?
x = df1.xs('b_1', axis=1, level=1) > 30
y = df1.xs('a_1', axis=1, level=1).eq('c')
z = (x | y).astype(int)
z.columns = pd.MultiIndex.from_product([z.columns, ['e_1']])
df1 = pd.concat([df1, z], axis=1).reindex(columns=['A1','B1','C1'], level=0)
print(df1)
Prints:
A1 B1 C1
a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1
date
1/1/2023 a 23 h 0 o 45 v 1 a3 1 b1 0
1/2/2023 b 34 i 1 p 3 w 0 a4 32 b2 1
1/3/2023 c 5 j 1 q 7 x 0 a5 6 b3 0
1/4/2023 d 2 k 0 r 5 y 0 a6 76 b4 1
1/5/2023 e 78 l 1 s 65 z 1 a7 9 b5 0
1/6/2023 f 98 m 1 t 23 a1 0 a8 14 b6 0
1/7/2023 g 3 n 0 u 1 a2 0 a9 45 b7 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论