Pandas 根据条件排序

huangapple go评论118阅读模式
英文:

Pandas Sorting with condition

问题

假设我有以下的数据框:

price position datetime
30 1 2023-01-08
30 4 2023-01-08
140 2 2023-01-09
140 3 2023-01-09
50 1 2023-01-09
50 4 2023-01-09
260 2 2023-01-09
260 3 2023-01-09

数据框中'position'和'datetime'都按升序排列。

我想要在相同的'datetime'值内对'position'中特定的值(1和4)按降序排序,最终结果如下所示:

price position datetime
30 4 2023-01-08
30 1 2023-01-08
140 2 2023-01-09
140 3 2023-01-09
50 4 2023-01-09
50 1 2023-01-09
260 2 2023-01-09
260 3 2023-01-09

我尝试了这段代码:

  1. df_trade = df_trade[(df_trade.position == 4) | (df_trade.position == 1)].sort_values(by="position", ascending=False)

但是它会报错:

  1. ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我想有更好、更简单的方法。

英文:

Suppose I have following dataframe:

price position datetime
30 1 2023-01-08
30 4 2023-01-08
140 2 2023-01-09
140 3 2023-01-09
50 1 2023-01-09
50 4 2023-01-09
260 2 2023-01-09
260 3 2023-01-09

The dataframe has both 'position' and 'datetime' in ascending order.

I want specific values in 'position' ( 1 and 4) to be sorted in descending order within the same 'datetime' values to have the end result as following:

price position datetime
30 4 2023-01-08
30 1 2023-01-08
140 2 2023-01-09
140 3 2023-01-09
50 4 2023-01-09
50 1 2023-01-09
260 2 2023-01-09
260 3 2023-01-09

I tried this code

  1. df_trade = df_trade[df_trade.position == 4 or df_trade.position == 1].sort_values(by = ["position"], ascending = False)

but it gives

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I guess there must be better, easy ways.

答案1

得分: 1

使用 numpy.lexsort 在“position” Series 中,其中您使用了负值的 1/4:

  1. import numpy as np
  2. order = np.lexsort([df['position'].mul(np.where(df['position'].isin([1,4]), -1, 1)), df['datetime']])
  3. out = df.iloc[order]

Pandas 等效代码:

  1. (df.assign(key=df['position'].mul(np.where(df['position'].isin([1,4]), -1, 1)))
  2. .sort_values(by=['datetime', 'key'])
  3. .drop(columns='key')
  4. )

输出:

  1. price position datetime
  2. 1 30 4 2023-01-08
  3. 0 30 1 2023-01-08
  4. 5 50 4 2023-01-09
  5. 4 50 1 2023-01-09
  6. 2 140 2 2023-01-09
  7. 6 260 2 2023-01-09
  8. 3 140 3 2023-01-09
  9. 7 260 3 2023-01-09
英文:

Use numpy.lexsort with the "position" Series in which you use the negative values of 1/4:

  1. import numpy as np
  2. order = np.lexsort([df['position'].mul(np.where(df['position'].isin([1,4]), -1, 1)), df['datetime']])
  3. out = df.iloc[order]

Pandas equivalent:

  1. (df.assign(key=df['position'].mul(np.where(df['position'].isin([1,4]), -1, 1)))
  2. .sort_values(by=['datetime', 'key'])
  3. .drop(columns='key')
  4. )

Output:

  1. price position datetime
  2. 1 30 4 2023-01-08
  3. 0 30 1 2023-01-08
  4. 5 50 4 2023-01-09
  5. 4 50 1 2023-01-09
  6. 2 140 2 2023-01-09
  7. 6 260 2 2023-01-09
  8. 3 140 3 2023-01-09
  9. 7 260 3 2023-01-09

huangapple
  • 本文由 发表于 2023年1月9日 11:11:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/75052856.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定