在多层次数据框中创建一个条件列

huangapple go评论59阅读模式
英文:

creating a conditional column in a multi-level dataframe

问题

import pandas as pd
import numpy as np

level_2 = ['X', 'Y', 'X', 'Y', 'X', 'Y']
level_1 = ['A', 'A', 'B', 'B', 'C', 'C']

data = [['a1', 2, 'b1', 4, 'c1', 3], ['a2', 16, 'b2', 48, 'c2', 78], ['a3', 10, 'b3', 12, 'c3', 34], ['a4', 114, 'b4', 6, 'c4', 1]]

columns = pd.MultiIndex.from_tuples(list(zip(level_1, level_2)))

df = pd.DataFrame(data, columns=columns)

# Find the two greatest numbers in column [A][Y] or [B][Y] or [C][Y]
greatest_y_values = df[['A', 'B', 'C']]['Y'].nlargest(2)

# Select the corresponding column [A][X] or [B][X] or [C][X]
result = df.loc[greatest_y_values.index][['A', 'B', 'C']]['X']

# Print the result
print(result)

This code will give you the two greatest values in columns [A][Y], [B][Y], or [C][Y], and then it selects the corresponding values from columns [A][X], [B][X], or [C][X].

英文:
import pandas as pd
import numpy as np

level_2 = ['X', 'Y', 'X', 'Y', 'X', 'Y']
level_1 = ['A', 'A', 'B', 'B', 'C', 'C']

data = [['a1', 2, 'b1', 4, 'c1', 3], ['a2', 16, 'b2', 48, 'c2', 78], ['a3', 10, 'b3', 12, 'c3', 34], ['a4', 114, 'b4', 6, 'c4', 1]]

columns = pd.MultiIndex.from_tuples(list(zip(level_1, level_2)))

df = pd.DataFrame(data, columns=columns)

I'm very new to python, so apologies for the basic nature of the question. I have the above dataframe. I would like to create another 2 columns based on the 2 greatest numbers in column [A][Y] or [B][Y] or [C][Y] and then select the corresponding column [A][X] or [B][X] or [C][X]. Any help would be greatly appreciated.

在多层次数据框中创建一个条件列

I've tried argsort but haven't been able to figure out how to reference the correct corresponding column.

答案1

得分: 2

也许不是最漂亮的解决方案,但它完成了任务(主要函数是 Series.nlargest):

def fn(x):
    x = x.nlargest(2)

    a, b = x
    ia, ib = x.index

    return {
        ("Greatest1", "X"): f"{df_x.loc[x.name, ia]}",
        ("Greatest1", "Y"): a,
        ("Greatest2", "X"): f"{df_x.loc[x.name, ib]}",
        ("Greatest2", "Y"): b,
    }

df_x = df.xs("X", axis=1, level=1)
x = df.xs("Y", axis=1, level=1).apply(fn, axis=1, result_type="expand")
df = pd.concat([df, x], axis=1)
print(df)

输出:

    A        B       C     Greatest1      Greatest2    
    X    Y   X   Y   X   Y         X    Y         X   Y
0  a1    2  b1   4  c1   3        b1    4        c1   3
1  a2   16  b2  48  c2  78        c2   78        b2  48
2  a3   10  b3  12  c3  34        c3   34        b3  12
3  a4  114  b4   6  c4   1        a4  114        b4   6
英文:

Maybe not the prettiest solution but it gets job done (the main function is Series.nlargest):

def fn(x):
    x = x.nlargest(2)

    a, b = x
    ia, ib = x.index

    return {
        ("Greatest1", "X"): f"{df_x.loc[x.name, ia]}",
        ("Greatest1", "Y"): a,
        ("Greatest2", "X"): f"{df_x.loc[x.name, ib]}",
        ("Greatest2", "Y"): b,
    }

df_x = df.xs("X", axis=1, level=1)
x = df.xs("Y", axis=1, level=1).apply(fn, axis=1, result_type="expand")
df = pd.concat([df, x], axis=1)
print(df)

Prints:

    A        B       C     Greatest1      Greatest2    
    X    Y   X   Y   X   Y         X    Y         X   Y
0  a1    2  b1   4  c1   3        b1    4        c1   3
1  a2   16  b2  48  c2  78        c2   78        b2  48
2  a3   10  b3  12  c3  34        c3   34        b3  12
3  a4  114  b4   6  c4   1        a4  114        b4   6

答案2

得分: 2

以下是您要翻译的内容:

Another possible option :

    NBOG = 2
    lvl1 = df.columns.levels[1]
    lst2 = [[f"Greatest{i+1}" for i in range(NBOG)], lvl1]
    
    arrg = (
        df.stack(0).set_index(lvl1[0], append=True)
            .groupby(level=0, group_keys=False)[lvl1[1]]
            .nlargest(NBOG).droplevel(0).reset_index(level=1)
            .to_numpy().reshape(-1, len(lvl1)*NBOG)
    )
    
    out = df.join(pd.DataFrame(arrg, columns=pd.MultiIndex.from_product(lst2)))

Ouptut :

    print(out)
    
        A        B       C     Greatest1      Greatest2    
        X    Y   X   Y   X   Y         X    Y         X   Y
    0  a1    2  b1   4  c1   3        b1    4        c1   3
    1  a2   16  b2  48  c2  78        c2   78        b2  48
    2  a3   10  b3  12  c3  34        c3   34        b3  12
    3  a4  114  b4   6  c4   1        a4  114        b4   6
英文:

Another possible option :

NBOG = 2
lvl1 = df.columns.levels[1]
lst2 = [[f"Greatest{i+1}" for i in range(NBOG)], lvl1]

arrg = (
    df.stack(0).set_index(lvl1[0], append=True)
        .groupby(level=0, group_keys=False)[lvl1[1]]
        .nlargest(NBOG).droplevel(0).reset_index(level=1)
        .to_numpy().reshape(-1, len(lvl1)*NBOG)
)

out = df.join(pd.DataFrame(arrg, columns=pd.MultiIndex.from_product(lst2)))

Ouptut :

print(out)

    A        B       C     Greatest1      Greatest2    
    X    Y   X   Y   X   Y         X    Y         X   Y
0  a1    2  b1   4  c1   3        b1    4        c1   3
1  a2   16  b2  48  c2  78        c2   78        b2  48
2  a3   10  b3  12  c3  34        c3   34        b3  12
3  a4  114  b4   6  c4   1        a4  114        b4   6

huangapple
  • 本文由 发表于 2023年6月9日 05:48:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76435912.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定