Pandas “Consecutive”/Rolling Percent Rank

huangapple go评论55阅读模式
英文:

Pandas "Consecutive"/Rolling Percent Rank

问题

You can achieve the desired consecutive percent rank using the expanding window in pandas. Here's the modified code:

df['rank'] = df['value'].expanding().apply(lambda x: (x.rank(ascending=False, pct=True)).iloc[-1])

This code will calculate the consecutive percent rank for each row in the DataFrame as the window grows with the expanding function.

英文:

How can I create a "consecutive"/rolling percent rank on a pandas df series with a rolling window that grows as the data frame row count grows- as opposed to being a fixed window. rolling() requires an integer for the window size.

I basically want the ranking to be consecutively calculated as opposed to running the ranking function across the entire series and outputting the results. First row would have a rank just on that one row. By the end of the data frame, the rank would be calculated across the entire series.

Desired dataframe output:

Index Value Rank (descending) description
0 6 1 6 is the first row so rank on 6 is 1
1 3 2 3 is the second largest value between 6 and 3 so rank is 2
2 4 2 4 is the second largest value between 6,3, and 4 so rank is 2
3 100 1 100 is the largest value in the series of 6,3,4,100 so rank is 1
4 1 5 1 is the smallest value in series of 6,3,4,100,1 so rank is last as 5

My thinking:

df['len']=range(len(df))
df['rank']=df['value'].rolling(df['len']).rank(pct=True)

答案1

得分: 0

df是一个包含值的DataFrame:

value
0	6
1	3
2	4
3	100
4	1

以下是代码的输出:

使用以下代码,创建了一个新的列"rank",表示值的排名:

value	rank
0	6	    1
1	3	    2
2	4	    2
3	100	    1
4	1	    5

如果你想要百分位排名,可以使用以下代码,创建了一个新的列"rank_pct":

value	rank_pct
0	6	1.000000
1	3	1.000000
2	4	0.666667
3	100	0.250000
4	1	1.000000
英文:

Example

df = pd.DataFrame([6, 3, 4, 100, 1], columns=['value'])

df

    value
0	6
1	3
2	4
3	100
4	1

Code

df.assign(rank=df['value'].expanding().rank(ascending=False).astype('int'))

output:

    value	rank
0	6	    1
1	3	    2
2	4	    2
3	100	    1
4	1	    5

if you want pct rank use following code

df.assign(rank_pct=df['value'].expanding().rank(ascending=False, pct=True))

output:

value	rank_pct
0	6	1.000000
1	3	1.000000
2	4	0.666667
3	100	0.250000
4	1	1.000000

huangapple
  • 本文由 发表于 2023年5月14日 09:24:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76245457.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定