Pandas 根据条件排序

huangapple go评论93阅读模式
英文:

Pandas Sorting with condition

问题

假设我有以下的数据框:

price position datetime
30 1 2023-01-08
30 4 2023-01-08
140 2 2023-01-09
140 3 2023-01-09
50 1 2023-01-09
50 4 2023-01-09
260 2 2023-01-09
260 3 2023-01-09

数据框中'position'和'datetime'都按升序排列。

我想要在相同的'datetime'值内对'position'中特定的值(1和4)按降序排序,最终结果如下所示:

price position datetime
30 4 2023-01-08
30 1 2023-01-08
140 2 2023-01-09
140 3 2023-01-09
50 4 2023-01-09
50 1 2023-01-09
260 2 2023-01-09
260 3 2023-01-09

我尝试了这段代码:

df_trade = df_trade[(df_trade.position == 4) | (df_trade.position == 1)].sort_values(by="position", ascending=False)

但是它会报错:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我想有更好、更简单的方法。

英文:

Suppose I have following dataframe:

price position datetime
30 1 2023-01-08
30 4 2023-01-08
140 2 2023-01-09
140 3 2023-01-09
50 1 2023-01-09
50 4 2023-01-09
260 2 2023-01-09
260 3 2023-01-09

The dataframe has both 'position' and 'datetime' in ascending order.

I want specific values in 'position' ( 1 and 4) to be sorted in descending order within the same 'datetime' values to have the end result as following:

price position datetime
30 4 2023-01-08
30 1 2023-01-08
140 2 2023-01-09
140 3 2023-01-09
50 4 2023-01-09
50 1 2023-01-09
260 2 2023-01-09
260 3 2023-01-09

I tried this code

df_trade = df_trade[df_trade.position == 4 or df_trade.position == 1].sort_values(by = ["position"], ascending = False)

but it gives

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I guess there must be better, easy ways.

答案1

得分: 1

使用 numpy.lexsort 在“position” Series 中,其中您使用了负值的 1/4:

import numpy as np

order = np.lexsort([df['position'].mul(np.where(df['position'].isin([1,4]), -1, 1)), df['datetime']])

out = df.iloc[order]

Pandas 等效代码:

(df.assign(key=df['position'].mul(np.where(df['position'].isin([1,4]), -1, 1)))
   .sort_values(by=['datetime', 'key'])
   .drop(columns='key')
)

输出:

   price  position    datetime
1     30         4  2023-01-08
0     30         1  2023-01-08
5     50         4  2023-01-09
4     50         1  2023-01-09
2    140         2  2023-01-09
6    260         2  2023-01-09
3    140         3  2023-01-09
7    260         3  2023-01-09
英文:

Use numpy.lexsort with the "position" Series in which you use the negative values of 1/4:

import numpy as np

order = np.lexsort([df['position'].mul(np.where(df['position'].isin([1,4]), -1, 1)), df['datetime']])

out = df.iloc[order]

Pandas equivalent:

(df.assign(key=df['position'].mul(np.where(df['position'].isin([1,4]), -1, 1)))
   .sort_values(by=['datetime', 'key'])
   .drop(columns='key')
 )

Output:

   price  position    datetime
1     30         4  2023-01-08
0     30         1  2023-01-08
5     50         4  2023-01-09
4     50         1  2023-01-09
2    140         2  2023-01-09
6    260         2  2023-01-09
3    140         3  2023-01-09
7    260         3  2023-01-09

huangapple
  • 本文由 发表于 2023年1月9日 11:11:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/75052856.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定