Pandas DataFrame:选择绝对值方面的逐行最大值

huangapple go评论81阅读模式
英文:

Pandas DataFrame: select row-wise max value in absolute terms

问题

我有一个只包含数值数据的数据框:

[ In1]: df = pd.DataFrame(np.random.randn(5, 3).round(2), columns=['A', 'B', 'C'])
        df

[Out1]:     A	    B	    C
        0	-0.27	 1.22	 1.10
        1	-3.22	 0.48	-1.64
        2	 1.42	 0.24	-0.12
        3	-1.12	 0.44	 0.23
        4	 1.88	-0.38	 0.62

如何选择每行的最大绝对值并保留符号?

在这种情况下,结果如下:

0     1.22
1    -3.22
2     1.42
3    -1.12
4     1.88

我已经确定要使用哪一列:

[ In2]: loc_max = df.abs().idxmax(axis=1)
        loc_max

[Out2]: 
        0    B
        1    A
        2    A
        3    A
        4    A

性能很重要,因为我的实际数据框很大。

英文:

I have a Dataframe with only numeric data:

[ In1]: df = pd.DataFrame(np.random.randn(5, 3).round(2), columns=['A', 'B', 'C'])
        df

[Out1]:	        A	    B	    C
        0	-0.27	 1.22	 1.10
        1	-3.22	 0.48	-1.64
        2	 1.42	 0.24	-0.12
        3	-1.12	 0.44	 0.23
        4	 1.88	-0.38	 0.62

How do I select, for each row, the max value in absolute terms while preserving the sign?

In this case it would be:

0     1.22
1    -3.22
2     1.42
3    -1.12
4     1.88

I got as far as determining which column to use:

[ In2]: loc_max = df.abs().idxmax(axis=1)
        loc_max

[Out2]: 
        0    B
        1    A
        2    A
        3    A
        4    A

Performance is important because my actual dataframe is big.


SOLUTIONS COMPARISON:

All answers below will give the desired outcome.

Performance comparison on a slightly bigger dataframe:

df = pd.DataFrame(np.random.randn(1000, 100).round(2))

def numpy_argmax():
    idx_max = np.abs(df.values).argmax(axis=1)
    val = df.values[range(len(df)), idx_max]
    return pd.Series(val, index=df.index)

def check_sign():
    row_max = df.abs().max(axis=1)
    return row_max * (-1) ** df.ne(row_max, axis=0).all(axis=1)

def loop_rows():
    return df.apply(lambda s: s[s.abs().idxmax()], axis=1)

def pandas_loc():
    s = df.abs().idxmax(axis=1)
    val = [df.loc[x, y] for x, y in zip(s.index, s)]
    return pd.Series(val, index=df.index)

%timeit numpy_argmax()
%timeit check_sign()
%timeit loop_rows()
%timeit pandas_loc()

Results:

Pandas DataFrame:选择绝对值方面的逐行最大值

As usual going to the numpy level behind pandas curtain achieves the best performance. (Let me know if that's not always true.)

答案1

得分: 3

让我们使用argmax在绝对值上来找到最大绝对值的索引。然后使用这些索引从每一行中获取相应的值。

v = df.values
v[range(len(v)), np.abs(v).argmax(axis=1)]

array([ 1.22, -3.22,  1.42, -1.12,  1.88])
英文:

Let us use argmax on the absolute values to find the indices of the maximum absolute values. Then use these indices to get the corresponding values from each row.

v = df.values
v[range(len(v)), np.abs(v).argmax(axis=1)]

array([ 1.22, -3.22,  1.42, -1.12,  1.88])

答案2

得分: 1

import pandas as pd

df = pd.DataFrame(np.random.randn(5, 3).round(2), columns=['A', 'B', 'C'])
idx_max = np.argmax( df.abs(), axis=1)
df.values[range(len(df)), idx_max]
英文:

A method that builds on your attempt, using numpy indices and numpy broadcasting rules:

import pandas as pd

df = pd.DataFrame(np.random.randn(5, 3).round(2), columns=['A', 'B', 'C'])
idx_max = np.argmax( df.abs(), axis=1)
df.values[range(len(df)), idx_max]

答案3

得分: 0

另一个可能的解决方案:

s = df.abs().idxmax(axis=1)
[df.loc[x, y] for x, y in zip(s.index, s)]
英文:

Another possible solution:

s = df.abs().idxmax(axis=1)
[df.loc[x, y] for x, y in zip(s.index, s)]

huangapple
  • 本文由 发表于 2023年7月27日 18:40:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76778934.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定