向多级列数据框添加条件列

huangapple go评论91阅读模式
英文:

Adding conditional columns to multi-level column dataframe

问题

  1. # 以下是更高效的代码,避免了循环:
  2. conditions = ((df1[('A1', 'b_1')] > 30) | (df1[('A1', 'a_1')] == 'c'),
  3. (df1[('B1', 'b_1')] > 30) | (df1[('B1', 'a_1')] == 'c'),
  4. (df1[('C1', 'b_1')] > 30) | (df1[('C1', 'a_1')] == 'c'))
  5. df1.loc[:, pd.IndexSlice[:, 'e_1']] = np.where(np.column_stack(conditions), 1, 0)
  6. df1 = df1.reindex(columns=['A1', 'B1', 'C1'], level=0)
英文:

I have a multi-level column dataframe where I need to add a column (level_2) based on conditions in other columns. The added column will be applied to all level_1 groups. For example, the conditional column will be addded to A1, B1, C1, etc. This dataframe is just an example of the larger dataframe I'm working on. Level_1 is dynamic and can change, for example, adding Z1 or taking out B1.

  1. import pandas as pd
  2. import numpy as np
  3. level_1 = ['A1', 'A1', 'A1', 'B1', 'B1', 'B1', 'C1', 'C1', 'C1']
  4. level_2 = ['a_1', 'b_1', 'c_1', 'a_1', 'b_1', 'c_1', 'a_1', 'b_1', 'c_1']
  5. data = [['a', 23, 'h', 'o', 45, 'v', 'a3', 1, 'b1'], ['b', 34, 'i', 'p', 3, 'w', 'a4', 32, 'b2'], ['c', 5, 'j', 'q', 7, 'x', 'a5', 6, 'b3'], ['d', 2, 'k', 'r', 5, 'y', 'a6', 76, 'b4'], ['e', 78, 'l', 's', 65, 'z', 'a7', 9, 'b5'], ['f', 98, 'm', 't', 23, 'a1', 'a8', 14, 'b6'], ['g', 3, 'n', 'u', 1, 'a2', 'a9', 45, 'b7']]
  6. columns = pd.MultiIndex.from_tuples(list(zip(level_1, level_2)))
  7. df1 = pd.DataFrame(data, columns=columns)
  8. date = ['1/1/2023','1/2/2023','1/3/2023','1/4/2023','1/5/2023','1/6/2023','1/7/2023']
  9. df1.insert(0, 'date', date)
  10. df1.set_index('date', inplace=True)

I've tried the below code, which works, but I am wondering if there is a more efficient way to do this, without looping? Thank you.

  1. for column_name in df1.columns.get_level_values(0).unique():
  2. df1.loc[(df1[column_name, 'b_1'] > 30) | (df1[column_name, 'a_1'] == 'c'), (column_name,'e_1')] = 1
  3. df1 = df1.reindex(columns=['A1','B1','C1'], level=0)

答案1

得分: 2

使用重新整形的方法有一个更简单的方式。堆叠(level=0)列值,然后根据所需条件分配新列 e_1,最后使用unstack将其重新整形为原始形式。

  1. s = df1.stack(level=0)
  2. s.loc
    展开收缩
    .eq('e') & s['b_1'].gt(30), 'e_1'] = 1
  3. s = s.unstack().swaplevel(axis=1).sort_index(axis=1)

  1. A1 B1 C1
  2. a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1
  3. date
  4. 1/1/2023 a 23 h NaN o 45 v NaN a3 1 b1 NaN
  5. 1/2/2023 b 34 i NaN p 3 w NaN a4 32 b2 NaN
  6. 1/3/2023 c 5 j NaN q 7 x NaN a5 6 b3 NaN
  7. 1/4/2023 d 2 k NaN r 5 y NaN a6 76 b4 NaN
  8. 1/5/2023 e 78 l 1.0 s 65 z NaN a7 9 b5 NaN
  9. 1/6/2023 f 98 m NaN t 23 a1 NaN a8 14 b6 NaN
  10. 1/7/2023 g 3 n NaN u 1 a2 NaN a9 45 b7 NaN
英文:

There is a simpler way with reshaping. Stack the level=0 column values then assign the new column e_1 based on the required condition, finally unstack to reshape back to original form

  1. s = df1.stack(level=0)
  2. s.loc
    展开收缩
    .eq('e') & s['b_1'].gt(30), 'e_1'] = 1
  3. s = s.unstack().swaplevel(axis=1).sort_index(axis=1)

  1. A1 B1 C1
  2. a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1
  3. date
  4. 1/1/2023 a 23 h NaN o 45 v NaN a3 1 b1 NaN
  5. 1/2/2023 b 34 i NaN p 3 w NaN a4 32 b2 NaN
  6. 1/3/2023 c 5 j NaN q 7 x NaN a5 6 b3 NaN
  7. 1/4/2023 d 2 k NaN r 5 y NaN a6 76 b4 NaN
  8. 1/5/2023 e 78 l 1.0 s 65 z NaN a7 9 b5 NaN
  9. 1/6/2023 f 98 m NaN t 23 a1 NaN a8 14 b6 NaN
  10. 1/7/2023 g 3 n NaN u 1 a2 NaN a9 45 b7 NaN

答案2

得分: 1

也许你可以使用 .xs

  1. x = df1.xs('b_1', axis=1, level=1) > 30
  2. y = df1.xs('a_1', axis=1, level=1).eq('c')
  3. z = (x | y).astype(int)
  4. z.columns = pd.MultiIndex.from_product([z.columns, ['e_1']])
  5. df1 = pd.concat([df1, z], axis=1).reindex(columns=['A1', 'B1', 'C1'], level=0)
  6. print(df1)

打印输出:

  1. A1 B1 C1
  2. a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1
  3. date
  4. 1/1/2023 a 23 h 0 o 45 v 1 a3 1 b1 0
  5. 1/2/2023 b 34 i 1 p 3 w 0 a4 32 b2 1
  6. 1/3/2023 c 5 j 1 q 7 x 0 a5 6 b3 0
  7. 1/4/2023 d 2 k 0 r 5 y 0 a6 76 b4 1
  8. 1/5/2023 e 78 l 1 s 65 z 1 a7 9 b5 0
  9. 1/6/2023 f 98 m 1 t 23 a1 0 a8 14 b6 0
  10. 1/7/2023 g 3 n 0 u 1 a2 0 a9 45 b7 1
英文:

Maybe you can use .xs?

  1. x = df1.xs('b_1', axis=1, level=1) > 30
  2. y = df1.xs('a_1', axis=1, level=1).eq('c')
  3. z = (x | y).astype(int)
  4. z.columns = pd.MultiIndex.from_product([z.columns, ['e_1']])
  5. df1 = pd.concat([df1, z], axis=1).reindex(columns=['A1','B1','C1'], level=0)
  6. print(df1)

Prints:

  1. A1 B1 C1
  2. a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1 a_1 b_1 c_1 e_1
  3. date
  4. 1/1/2023 a 23 h 0 o 45 v 1 a3 1 b1 0
  5. 1/2/2023 b 34 i 1 p 3 w 0 a4 32 b2 1
  6. 1/3/2023 c 5 j 1 q 7 x 0 a5 6 b3 0
  7. 1/4/2023 d 2 k 0 r 5 y 0 a6 76 b4 1
  8. 1/5/2023 e 78 l 1 s 65 z 1 a7 9 b5 0
  9. 1/6/2023 f 98 m 1 t 23 a1 0 a8 14 b6 0
  10. 1/7/2023 g 3 n 0 u 1 a2 0 a9 45 b7 1

huangapple
  • 本文由 发表于 2023年6月16日 01:50:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76484301.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定