Pandas “Consecutive”/Rolling Percent Rank

huangapple go评论97阅读模式
英文:

Pandas "Consecutive"/Rolling Percent Rank

问题

You can achieve the desired consecutive percent rank using the expanding window in pandas. Here's the modified code:

  1. df['rank'] = df['value'].expanding().apply(lambda x: (x.rank(ascending=False, pct=True)).iloc[-1])

This code will calculate the consecutive percent rank for each row in the DataFrame as the window grows with the expanding function.

英文:

How can I create a "consecutive"/rolling percent rank on a pandas df series with a rolling window that grows as the data frame row count grows- as opposed to being a fixed window. rolling() requires an integer for the window size.

I basically want the ranking to be consecutively calculated as opposed to running the ranking function across the entire series and outputting the results. First row would have a rank just on that one row. By the end of the data frame, the rank would be calculated across the entire series.

Desired dataframe output:

Index Value Rank (descending) description
0 6 1 6 is the first row so rank on 6 is 1
1 3 2 3 is the second largest value between 6 and 3 so rank is 2
2 4 2 4 is the second largest value between 6,3, and 4 so rank is 2
3 100 1 100 is the largest value in the series of 6,3,4,100 so rank is 1
4 1 5 1 is the smallest value in series of 6,3,4,100,1 so rank is last as 5

My thinking:

  1. df['len']=range(len(df))
  2. df['rank']=df['value'].rolling(df['len']).rank(pct=True)

答案1

得分: 0

df是一个包含值的DataFrame:

  1. value
  2. 0 6
  3. 1 3
  4. 2 4
  5. 3 100
  6. 4 1

以下是代码的输出:

使用以下代码,创建了一个新的列"rank",表示值的排名:

  1. value rank
  2. 0 6 1
  3. 1 3 2
  4. 2 4 2
  5. 3 100 1
  6. 4 1 5

如果你想要百分位排名,可以使用以下代码,创建了一个新的列"rank_pct":

  1. value rank_pct
  2. 0 6 1.000000
  3. 1 3 1.000000
  4. 2 4 0.666667
  5. 3 100 0.250000
  6. 4 1 1.000000
英文:

Example

  1. df = pd.DataFrame([6, 3, 4, 100, 1], columns=['value'])

df

  1. value
  2. 0 6
  3. 1 3
  4. 2 4
  5. 3 100
  6. 4 1

Code

  1. df.assign(rank=df['value'].expanding().rank(ascending=False).astype('int'))

output:

  1. value rank
  2. 0 6 1
  3. 1 3 2
  4. 2 4 2
  5. 3 100 1
  6. 4 1 5

if you want pct rank use following code

  1. df.assign(rank_pct=df['value'].expanding().rank(ascending=False, pct=True))

output:

  1. value rank_pct
  2. 0 6 1.000000
  3. 1 3 1.000000
  4. 2 4 0.666667
  5. 3 100 0.250000
  6. 4 1 1.000000

huangapple
  • 本文由 发表于 2023年5月14日 09:24:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76245457.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定