在多层次数据框中创建一个条件列

huangapple go评论88阅读模式
英文:

creating a conditional column in a multi-level dataframe

问题

  1. import pandas as pd
  2. import numpy as np
  3. level_2 = ['X', 'Y', 'X', 'Y', 'X', 'Y']
  4. level_1 = ['A', 'A', 'B', 'B', 'C', 'C']
  5. data = [['a1', 2, 'b1', 4, 'c1', 3], ['a2', 16, 'b2', 48, 'c2', 78], ['a3', 10, 'b3', 12, 'c3', 34], ['a4', 114, 'b4', 6, 'c4', 1]]
  6. columns = pd.MultiIndex.from_tuples(list(zip(level_1, level_2)))
  7. df = pd.DataFrame(data, columns=columns)
  8. # Find the two greatest numbers in column [A][Y] or [B][Y] or [C][Y]
  9. greatest_y_values = df[['A', 'B', 'C']]['Y'].nlargest(2)
  10. # Select the corresponding column [A][X] or [B][X] or [C][X]
  11. result = df.loc[greatest_y_values.index][['A', 'B', 'C']]['X']
  12. # Print the result
  13. print(result)

This code will give you the two greatest values in columns [A][Y], [B][Y], or [C][Y], and then it selects the corresponding values from columns [A][X], [B][X], or [C][X].

英文:
  1. import pandas as pd
  2. import numpy as np
  3. level_2 = ['X', 'Y', 'X', 'Y', 'X', 'Y']
  4. level_1 = ['A', 'A', 'B', 'B', 'C', 'C']
  5. data = [['a1', 2, 'b1', 4, 'c1', 3], ['a2', 16, 'b2', 48, 'c2', 78], ['a3', 10, 'b3', 12, 'c3', 34], ['a4', 114, 'b4', 6, 'c4', 1]]
  6. columns = pd.MultiIndex.from_tuples(list(zip(level_1, level_2)))
  7. df = pd.DataFrame(data, columns=columns)

I'm very new to python, so apologies for the basic nature of the question. I have the above dataframe. I would like to create another 2 columns based on the 2 greatest numbers in column [A][Y] or [B][Y] or [C][Y] and then select the corresponding column [A][X] or [B][X] or [C][X]. Any help would be greatly appreciated.

在多层次数据框中创建一个条件列

I've tried argsort but haven't been able to figure out how to reference the correct corresponding column.

答案1

得分: 2

也许不是最漂亮的解决方案,但它完成了任务(主要函数是 Series.nlargest):

  1. def fn(x):
  2. x = x.nlargest(2)
  3. a, b = x
  4. ia, ib = x.index
  5. return {
  6. ("Greatest1", "X"): f"{df_x.loc[x.name, ia]}",
  7. ("Greatest1", "Y"): a,
  8. ("Greatest2", "X"): f"{df_x.loc[x.name, ib]}",
  9. ("Greatest2", "Y"): b,
  10. }
  11. df_x = df.xs("X", axis=1, level=1)
  12. x = df.xs("Y", axis=1, level=1).apply(fn, axis=1, result_type="expand")
  13. df = pd.concat([df, x], axis=1)
  14. print(df)

输出:

  1. A B C Greatest1 Greatest2
  2. X Y X Y X Y X Y X Y
  3. 0 a1 2 b1 4 c1 3 b1 4 c1 3
  4. 1 a2 16 b2 48 c2 78 c2 78 b2 48
  5. 2 a3 10 b3 12 c3 34 c3 34 b3 12
  6. 3 a4 114 b4 6 c4 1 a4 114 b4 6
英文:

Maybe not the prettiest solution but it gets job done (the main function is Series.nlargest):

  1. def fn(x):
  2. x = x.nlargest(2)
  3. a, b = x
  4. ia, ib = x.index
  5. return {
  6. ("Greatest1", "X"): f"{df_x.loc[x.name, ia]}",
  7. ("Greatest1", "Y"): a,
  8. ("Greatest2", "X"): f"{df_x.loc[x.name, ib]}",
  9. ("Greatest2", "Y"): b,
  10. }
  11. df_x = df.xs("X", axis=1, level=1)
  12. x = df.xs("Y", axis=1, level=1).apply(fn, axis=1, result_type="expand")
  13. df = pd.concat([df, x], axis=1)
  14. print(df)

Prints:

  1. A B C Greatest1 Greatest2
  2. X Y X Y X Y X Y X Y
  3. 0 a1 2 b1 4 c1 3 b1 4 c1 3
  4. 1 a2 16 b2 48 c2 78 c2 78 b2 48
  5. 2 a3 10 b3 12 c3 34 c3 34 b3 12
  6. 3 a4 114 b4 6 c4 1 a4 114 b4 6

答案2

得分: 2

以下是您要翻译的内容:

  1. Another possible option :
  2. NBOG = 2
  3. lvl1 = df.columns.levels[1]
  4. lst2 = [[f"Greatest{i+1}" for i in range(NBOG)], lvl1]
  5. arrg = (
  6. df.stack(0).set_index(lvl1[0], append=True)
  7. .groupby(level=0, group_keys=False)[lvl1[1]]
  8. .nlargest(NBOG).droplevel(0).reset_index(level=1)
  9. .to_numpy().reshape(-1, len(lvl1)*NBOG)
  10. )
  11. out = df.join(pd.DataFrame(arrg, columns=pd.MultiIndex.from_product(lst2)))
  12. Ouptut :
  13. print(out)
  14. A B C Greatest1 Greatest2
  15. X Y X Y X Y X Y X Y
  16. 0 a1 2 b1 4 c1 3 b1 4 c1 3
  17. 1 a2 16 b2 48 c2 78 c2 78 b2 48
  18. 2 a3 10 b3 12 c3 34 c3 34 b3 12
  19. 3 a4 114 b4 6 c4 1 a4 114 b4 6
英文:

Another possible option :

  1. NBOG = 2
  2. lvl1 = df.columns.levels[1]
  3. lst2 = [[f"Greatest{i+1}" for i in range(NBOG)], lvl1]
  4. arrg = (
  5. df.stack(0).set_index(lvl1[0], append=True)
  6. .groupby(level=0, group_keys=False)[lvl1[1]]
  7. .nlargest(NBOG).droplevel(0).reset_index(level=1)
  8. .to_numpy().reshape(-1, len(lvl1)*NBOG)
  9. )
  10. out = df.join(pd.DataFrame(arrg, columns=pd.MultiIndex.from_product(lst2)))

Ouptut :

  1. print(out)
  2. A B C Greatest1 Greatest2
  3. X Y X Y X Y X Y X Y
  4. 0 a1 2 b1 4 c1 3 b1 4 c1 3
  5. 1 a2 16 b2 48 c2 78 c2 78 b2 48
  6. 2 a3 10 b3 12 c3 34 c3 34 b3 12
  7. 3 a4 114 b4 6 c4 1 a4 114 b4 6

huangapple
  • 本文由 发表于 2023年6月9日 05:48:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76435912.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定