英文:
Pandas Grouped Cumulative Count with Condition
问题
我有一个带有年份、用户ID和积分的pandas数据框。我试图计算下面的第四列 - 用户每年积分为0或更少的连续年份数。
示例数据框:
年份 | 用户ID | 积分 | 连续计数 |
---|---|---|---|
2010 | 13 | 10 | 0 |
2011 | 13 | 0 | 0 |
2012 | 13 | -5 | 1 |
2013 | 13 | 0 | 2 |
2014 | 13 | 4 | 0 |
2010 | 77 | -9 | 0 |
2011 | 77 | -1 | 1 |
2012 | 77 | 5 | 0 |
2013 | 77 | 0 | 0 |
2014 | 77 | -1 | 1 |
英文:
I have a pandas df with Year, UserID, and Points. I'm trying to derive the fourth column below - a running count of the number of consecutive years a user has 0 points or less per year.
Ex Df
Year | UserID | Points | RunningCount |
---|---|---|---|
2010 | 13 | 10 | 0 |
2011 | 13 | 0 | 0 |
2012 | 13 | -5 | 1 |
2013 | 13 | 0 | 2 |
2014 | 13 | 4 | 0 |
2010 | 77 | -9 | 0 |
2011 | 77 | -1 | 1 |
2012 | 77 | 5 | 0 |
2013 | 77 | 0 | 0 |
2014 | 77 | -1 | 1 |
答案1
得分: 1
你可以先为连续的正负点创建一个分组列,然后按照该分组使用cumcount
:
neg_group = df.Points.le(0).diff().ne(0).groupby(df.UserID).cumsum()
df.groupby([df.UserID, neg_group]).cumcount()
英文:
You can create a group column for consecutive positive or negative points first and then do a cumcount
by the group:
neg_group = df.Points.le(0).diff().ne(0).groupby(df.UserID).cumsum()
neg_group
0 1
1 2
2 2
3 2
4 3
5 1
6 1
7 2
8 3
9 3
Name: Points, dtype: int64
df.groupby([df.UserID, neg_group]).cumcount()
0 0
1 0
2 1
3 2
4 0
5 0
6 1
7 0
8 0
9 1
dtype: int64
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论