英文:
Pandas DataFrame: select row-wise max value in absolute terms
问题
我有一个只包含数值数据的数据框:
[ In1]: df = pd.DataFrame(np.random.randn(5, 3).round(2), columns=['A', 'B', 'C'])
df
[Out1]: A B C
0 -0.27 1.22 1.10
1 -3.22 0.48 -1.64
2 1.42 0.24 -0.12
3 -1.12 0.44 0.23
4 1.88 -0.38 0.62
如何选择每行的最大绝对值并保留符号?
在这种情况下,结果如下:
0 1.22
1 -3.22
2 1.42
3 -1.12
4 1.88
我已经确定要使用哪一列:
[ In2]: loc_max = df.abs().idxmax(axis=1)
loc_max
[Out2]:
0 B
1 A
2 A
3 A
4 A
性能很重要,因为我的实际数据框很大。
英文:
I have a Dataframe with only numeric data:
[ In1]: df = pd.DataFrame(np.random.randn(5, 3).round(2), columns=['A', 'B', 'C'])
df
[Out1]: A B C
0 -0.27 1.22 1.10
1 -3.22 0.48 -1.64
2 1.42 0.24 -0.12
3 -1.12 0.44 0.23
4 1.88 -0.38 0.62
How do I select, for each row, the max value in absolute terms while preserving the sign?
In this case it would be:
0 1.22
1 -3.22
2 1.42
3 -1.12
4 1.88
I got as far as determining which column to use:
[ In2]: loc_max = df.abs().idxmax(axis=1)
loc_max
[Out2]:
0 B
1 A
2 A
3 A
4 A
Performance is important because my actual dataframe is big.
SOLUTIONS COMPARISON:
All answers below will give the desired outcome.
Performance comparison on a slightly bigger dataframe:
df = pd.DataFrame(np.random.randn(1000, 100).round(2))
def numpy_argmax():
idx_max = np.abs(df.values).argmax(axis=1)
val = df.values[range(len(df)), idx_max]
return pd.Series(val, index=df.index)
def check_sign():
row_max = df.abs().max(axis=1)
return row_max * (-1) ** df.ne(row_max, axis=0).all(axis=1)
def loop_rows():
return df.apply(lambda s: s[s.abs().idxmax()], axis=1)
def pandas_loc():
s = df.abs().idxmax(axis=1)
val = [df.loc[x, y] for x, y in zip(s.index, s)]
return pd.Series(val, index=df.index)
%timeit numpy_argmax()
%timeit check_sign()
%timeit loop_rows()
%timeit pandas_loc()
Results:
As usual going to the numpy
level behind pandas
curtain achieves the best performance. (Let me know if that's not always true.)
答案1
得分: 3
让我们使用argmax
在绝对值上来找到最大绝对值的索引。然后使用这些索引从每一行中获取相应的值。
v = df.values
v[range(len(v)), np.abs(v).argmax(axis=1)]
array([ 1.22, -3.22, 1.42, -1.12, 1.88])
英文:
Let us use argmax
on the absolute values to find the indices of the maximum absolute values. Then use these indices to get the corresponding values from each row.
v = df.values
v[range(len(v)), np.abs(v).argmax(axis=1)]
array([ 1.22, -3.22, 1.42, -1.12, 1.88])
答案2
得分: 1
import pandas as pd
df = pd.DataFrame(np.random.randn(5, 3).round(2), columns=['A', 'B', 'C'])
idx_max = np.argmax( df.abs(), axis=1)
df.values[range(len(df)), idx_max]
英文:
A method that builds on your attempt, using numpy indices and numpy broadcasting rules:
import pandas as pd
df = pd.DataFrame(np.random.randn(5, 3).round(2), columns=['A', 'B', 'C'])
idx_max = np.argmax( df.abs(), axis=1)
df.values[range(len(df)), idx_max]
答案3
得分: 0
另一个可能的解决方案:
s = df.abs().idxmax(axis=1)
[df.loc[x, y] for x, y in zip(s.index, s)]
英文:
Another possible solution:
s = df.abs().idxmax(axis=1)
[df.loc[x, y] for x, y in zip(s.index, s)]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论