英文:
Number winning streak ID's in pandas
问题
我有一个Python pandas数据框,其中包含一些球队在多个时间段内的连胜记录,并且我想要按照时间顺序识别这些连胜记录。所以,我有如下数据:
import pandas as pd
data = pd.DataFrame({'period': list(range(1,7))+list(range(1,6)),
'team_id': ['A']*6 + ['B']*5,
'win': [1,1,1,0,1,1,1,0,0,1,1],
'streak_length': [1,2,3,0,1,2,1,0,0,1,2]})
print(data)
我想要的结果是:
result = pd.DataFrame({'period': list(range(1,7))+list(range(1,6)),
'team_id': ['A']*6 + ['B']*5,
'win': [1,1,1,0,1,1,1,0,0,1,1],
'streak_length': [1,2,3,0,1,2,1,0,0,1,2],
'streak_id': [1,1,1,None,2,2,1,None,None,2,2]})
print(result)
我尝试了按`team_id`分组并对连胜长度求和,但可能会出现重复,所以我认为这种方法行不通。感谢任何帮助!
英文:
I have a Python pandas dataframe with winning streaks for some teams over several time periods and I would like to identfy the streaks chronologically. So, what I have is:
import pandas as pd
data = pd.DataFrame({'period': list(range(1,7))+list(range(1,6)),
'team_id': ['A']*6 + ['B']*5,
'win': [1,1,1,0,1,1,1,0,0,1,1],
'streak_length': [1,2,3,0,1,2,1,0,0,1,2]})
print(data)
And what I would like to have is:
result = pd.DataFrame({'period': list(range(1,7))+list(range(1,6)),
'team_id': ['A']*6 + ['B']*5,
'win': [1,1,1,0,1,1,1,0,0,1,1],
'streak_length': [1,2,3,0,1,2,1,0,0,1,2],
'streak_id': [1,1,1,None,2,2,1,None,None,2,2]})
print(result)
I tried to groupby by team_id
and sum over streak length, but it can be repeated, so I think this would not work. Any help appreciated!
答案1
得分: 6
使用Series.shift
、Series.ne
和Series.cumsum
创建连续的分组,仅筛选win
中的1
,并使用GroupBy.transform
和lambda函数中的factorize
:
m = data['win'].eq(1)
g = data['win'].ne(data['win'].shift()).cumsum()
data['streak_id'] = g[m].groupby(data['team_id']).transform(
lambda x: pd.factorize(x)[0] + 1
)
打印结果如下:
period team_id win streak_length streak_id
0 1 A 1 1 1.0
1 2 A 1 2 1.0
2 3 A 1 3 1.0
3 4 A 0 0 NaN
4 5 A 1 1 2.0
5 6 A 1 2 2.0
6 1 B 1 1 1.0
7 2 B 0 0 NaN
8 3 B 0 0 NaN
9 4 B 1 1 2.0
10 5 B 1 2 2.0
英文:
Create consecutive groups by Series.shift
Series.ne
and Series.cumsum
, filter only 1
in win
and use GroupBy.transform
with factorize
in lambda function:
m = data['win'].eq(1)
g = data['win'].ne(data['win'].shift()).cumsum()
data['streak_id'] = g[m].groupby(data['team_id']).transform(
lambda x: pd.factorize(x)[0] + 1
)
print (data)
period team_id win streak_length streak_id
0 1 A 1 1 1.0
1 2 A 1 2 1.0
2 3 A 1 3 1.0
3 4 A 0 0 NaN
4 5 A 1 1 2.0
5 6 A 1 2 2.0
6 1 B 1 1 1.0
7 2 B 0 0 NaN
8 3 B 0 0 NaN
9 4 B 1 1 2.0
10 5 B 1 2 2.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论