2023年6月9日 05:48:55go评论88阅读模式

英文:

creating a conditional column in a multi-level dataframe

问题

import pandas as pd
import numpy as np
level_2 = ['X', 'Y', 'X', 'Y', 'X', 'Y']
level_1 = ['A', 'A', 'B', 'B', 'C', 'C']
data = [['a1', 2, 'b1', 4, 'c1', 3], ['a2', 16, 'b2', 48, 'c2', 78], ['a3', 10, 'b3', 12, 'c3', 34], ['a4', 114, 'b4', 6, 'c4', 1]]
columns = pd.MultiIndex.from_tuples(list(zip(level_1, level_2)))
df = pd.DataFrame(data, columns=columns)
# Find the two greatest numbers in column [A][Y] or [B][Y] or [C][Y]
greatest_y_values = df[['A', 'B', 'C']]['Y'].nlargest(2)
# Select the corresponding column [A][X] or [B][X] or [C][X]
result = df.loc[greatest_y_values.index][['A', 'B', 'C']]['X']
# Print the result
print(result)

This code will give you the two greatest values in columns [A][Y], [B][Y], or [C][Y], and then it selects the corresponding values from columns [A][X], [B][X], or [C][X].

英文:

import pandas as pd
import numpy as np
level_2 = [&#39;X&#39;, &#39;Y&#39;, &#39;X&#39;, &#39;Y&#39;, &#39;X&#39;, &#39;Y&#39;]
level_1 = [&#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;B&#39;, &#39;C&#39;, &#39;C&#39;]
data = [[&#39;a1&#39;, 2, &#39;b1&#39;, 4, &#39;c1&#39;, 3], [&#39;a2&#39;, 16, &#39;b2&#39;, 48, &#39;c2&#39;, 78], [&#39;a3&#39;, 10, &#39;b3&#39;, 12, &#39;c3&#39;, 34], [&#39;a4&#39;, 114, &#39;b4&#39;, 6, &#39;c4&#39;, 1]]
columns = pd.MultiIndex.from_tuples(list(zip(level_1, level_2)))
df = pd.DataFrame(data, columns=columns)

I'm very new to python, so apologies for the basic nature of the question. I have the above dataframe. I would like to create another 2 columns based on the 2 greatest numbers in column [A][Y] or [B][Y] or [C][Y] and then select the corresponding column [A][X] or [B][X] or [C][X]. Any help would be greatly appreciated.

在多层次数据框中创建一个条件列

I've tried argsort but haven't been able to figure out how to reference the correct corresponding column.

答案1

得分: 2

也许不是最漂亮的解决方案，但它完成了任务（主要函数是 Series.nlargest）：

def fn(x):
    x = x.nlargest(2)
    a, b = x
    ia, ib = x.index
    return {
        ("Greatest1", "X"): f"{df_x.loc[x.name, ia]}",
        ("Greatest1", "Y"): a,
        ("Greatest2", "X"): f"{df_x.loc[x.name, ib]}",
        ("Greatest2", "Y"): b,
    }
df_x = df.xs("X", axis=1, level=1)
x = df.xs("Y", axis=1, level=1).apply(fn, axis=1, result_type="expand")
df = pd.concat([df, x], axis=1)
print(df)

输出：

    A        B       C     Greatest1      Greatest2    
    X    Y   X   Y   X   Y         X    Y         X   Y
0  a1    2  b1   4  c1   3        b1    4        c1   3
1  a2   16  b2  48  c2  78        c2   78        b2  48
2  a3   10  b3  12  c3  34        c3   34        b3  12
3  a4  114  b4   6  c4   1        a4  114        b4   6

英文:

Maybe not the prettiest solution but it gets job done (the main function is Series.nlargest):

def fn(x):
    x = x.nlargest(2)
    a, b = x
    ia, ib = x.index
    return {
        (&quot;Greatest1&quot;, &quot;X&quot;): f&quot;{df_x.loc[x.name, ia]}&quot;,
        (&quot;Greatest1&quot;, &quot;Y&quot;): a,
        (&quot;Greatest2&quot;, &quot;X&quot;): f&quot;{df_x.loc[x.name, ib]}&quot;,
        (&quot;Greatest2&quot;, &quot;Y&quot;): b,
    }
df_x = df.xs(&quot;X&quot;, axis=1, level=1)
x = df.xs(&quot;Y&quot;, axis=1, level=1).apply(fn, axis=1, result_type=&quot;expand&quot;)
df = pd.concat([df, x], axis=1)
print(df)

Prints:

    A        B       C     Greatest1      Greatest2    
    X    Y   X   Y   X   Y         X    Y         X   Y
0  a1    2  b1   4  c1   3        b1    4        c1   3
1  a2   16  b2  48  c2  78        c2   78        b2  48
2  a3   10  b3  12  c3  34        c3   34        b3  12
3  a4  114  b4   6  c4   1        a4  114        b4   6

答案2

得分: 2

以下是您要翻译的内容：

Another possible option :
    NBOG = 2
    lvl1 = df.columns.levels[1]
    lst2 = [[f"Greatest{i+1}" for i in range(NBOG)], lvl1]
    
    arrg = (
        df.stack(0).set_index(lvl1[0], append=True)
            .groupby(level=0, group_keys=False)[lvl1[1]]
            .nlargest(NBOG).droplevel(0).reset_index(level=1)
            .to_numpy().reshape(-1, len(lvl1)*NBOG)
    )
    
    out = df.join(pd.DataFrame(arrg, columns=pd.MultiIndex.from_product(lst2)))
Ouptut :
    print(out)
    
        A        B       C     Greatest1      Greatest2    
        X    Y   X   Y   X   Y         X    Y         X   Y
    0  a1    2  b1   4  c1   3        b1    4        c1   3
    1  a2   16  b2  48  c2  78        c2   78        b2  48
    2  a3   10  b3  12  c3  34        c3   34        b3  12
    3  a4  114  b4   6  c4   1        a4  114        b4   6

英文:

Another possible option :

NBOG = 2
lvl1 = df.columns.levels[1]
lst2 = [[f&quot;Greatest{i+1}&quot; for i in range(NBOG)], lvl1]
arrg = (
    df.stack(0).set_index(lvl1[0], append=True)
        .groupby(level=0, group_keys=False)[lvl1[1]]
        .nlargest(NBOG).droplevel(0).reset_index(level=1)
        .to_numpy().reshape(-1, len(lvl1)*NBOG)
)
out = df.join(pd.DataFrame(arrg, columns=pd.MultiIndex.from_product(lst2)))

Ouptut :

print(out)
    A        B       C     Greatest1      Greatest2    
    X    Y   X   Y   X   Y         X    Y         X   Y
0  a1    2  b1   4  c1   3        b1    4        c1   3
1  a2   16  b2  48  c2  78        c2   78        b2  48
2  a3   10  b3  12  c3  34        c3   34        b3  12
3  a4  114  b4   6  c4   1        a4  114        b4   6

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在多层次数据框中创建一个条件列

问题

答案1

答案2

Ansible使用Python3 pip安装问题

如何从Influx数据库查询中按升序获取日期，目前日期以字符串格式呈现。

Scrapy仅爬取站点的前5页。

SQLAlchemy映射器事件未触发。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。