在pandas中获取连续数字胜利记录的ID。

huangapple go评论69阅读模式
英文:

Number winning streak ID's in pandas

问题

我有一个Python pandas数据框其中包含一些球队在多个时间段内的连胜记录并且我想要按照时间顺序识别这些连胜记录所以我有如下数据

import pandas as pd
data = pd.DataFrame({'period': list(range(1,7))+list(range(1,6)),
    'team_id':       ['A']*6 + ['B']*5,
    'win':           [1,1,1,0,1,1,1,0,0,1,1],
    'streak_length': [1,2,3,0,1,2,1,0,0,1,2]})
print(data)

我想要的结果是

result = pd.DataFrame({'period': list(range(1,7))+list(range(1,6)),
    'team_id':       ['A']*6 + ['B']*5,
    'win':           [1,1,1,0,1,1,1,0,0,1,1],
    'streak_length': [1,2,3,0,1,2,1,0,0,1,2],
    'streak_id':     [1,1,1,None,2,2,1,None,None,2,2]})
print(result)

我尝试了按`team_id`分组并对连胜长度求和但可能会出现重复所以我认为这种方法行不通感谢任何帮助
英文:

I have a Python pandas dataframe with winning streaks for some teams over several time periods and I would like to identfy the streaks chronologically. So, what I have is:

import pandas as pd
data = pd.DataFrame({'period': list(range(1,7))+list(range(1,6)),
    'team_id':       ['A']*6 + ['B']*5,
    'win':           [1,1,1,0,1,1,1,0,0,1,1],
    'streak_length': [1,2,3,0,1,2,1,0,0,1,2]})
print(data)

And what I would like to have is:

result = pd.DataFrame({'period': list(range(1,7))+list(range(1,6)),
    'team_id':       ['A']*6 + ['B']*5,
    'win':           [1,1,1,0,1,1,1,0,0,1,1],
    'streak_length': [1,2,3,0,1,2,1,0,0,1,2],
    'streak_id':     [1,1,1,None,2,2,1,None,None,2,2]})
print(result)

I tried to groupby by team_id and sum over streak length, but it can be repeated, so I think this would not work. Any help appreciated!

答案1

得分: 6

使用Series.shiftSeries.neSeries.cumsum创建连续的分组,仅筛选win中的1,并使用GroupBy.transform和lambda函数中的factorize

m = data['win'].eq(1)
g = data['win'].ne(data['win'].shift()).cumsum()

data['streak_id'] = g[m].groupby(data['team_id']).transform(
    lambda x: pd.factorize(x)[0] + 1
)

打印结果如下:

   period team_id  win  streak_length  streak_id
0       1       A    1              1        1.0
1       2       A    1              2        1.0
2       3       A    1              3        1.0
3       4       A    0              0        NaN
4       5       A    1              1        2.0
5       6       A    1              2        2.0
6       1       B    1              1        1.0
7       2       B    0              0        NaN
8       3       B    0              0        NaN
9       4       B    1              1        2.0
10      5       B    1              2        2.0
英文:

Create consecutive groups by Series.shift Series.ne and Series.cumsum, filter only 1 in win and use GroupBy.transform with factorize in lambda function:

m = data['win'].eq(1)
g = data['win'].ne(data['win'].shift()).cumsum()

data['streak_id'] = g[m].groupby(data['team_id']).transform(
    lambda x: pd.factorize(x)[0] + 1
)

print (data)
    period team_id  win  streak_length  streak_id
0        1       A    1              1        1.0
1        2       A    1              2        1.0
2        3       A    1              3        1.0
3        4       A    0              0        NaN
4        5       A    1              1        2.0
5        6       A    1              2        2.0
6        1       B    1              1        1.0
7        2       B    0              0        NaN
8        3       B    0              0        NaN
9        4       B    1              1        2.0
10       5       B    1              2        2.0

huangapple
  • 本文由 发表于 2020年1月3日 19:13:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/59577585.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定