英文:
Sum of values from multiple column, C & D based on a unique value appearing in column A & B in Pandas
问题
From a .csv file, I have columns and a few data:
Season HomeTeam AwayTeam HomeYellow AwayYellow HomeFoul AwayFoul
2000-01 Chelsea West Ham 1 2 19 14
2000-01 Bradford Chelsea 0 1 14 16
2000-01 Chelsea Arsenal 3 1 16 22
.
.
2002-03 Chelsea Blackburn 2 1 11 12
2002-03 N.castle Chelsea 0 1 12 22
.
.
2006-07 Watford Chelsea 1 3 18 9
2006-07 Chelsea Tottenham 2 1 13 11
I want to find out the tally of Yellow Cards given to Chelsea and have the sum grouped by Season.
Similarly I want to swap the Yellow Card details to find out the total Fouls to Chelsea / Season.
I'm guessing the line of code will be the same and only need to swap the column name in a different cell?
Because I want to do the same by swapping the Team Name too.
What I hope to achieve is: (For Chelsea)
Season YellowCards
2000-01 5
2002-03 3
2006-07 5
And similarly
Season Fouls
2000-01 51
2002-03 33
2006-07 22
英文:
From a .csv file, I have columns and a few data:
Season HomeTeam AwayTeam HomeYellow AwayYellow HomeFoul AwayFoul
2000-01 Chelsea West Ham 1 2 19 14
2000-01 Bradford Chelsea 0 1 14 16
2000-01 Chelsea Arsenal 3 1 16 22
.
.
2002-03 Chelsea Blackburn 2 1 11 12
2002-03 N.castle Chelsea 0 1 12 22
.
.
2006-07 Watford Chelsea 1 3 18 9
2006-07 Chelsea Tottenham 2 1 13 11
I want to find out the tally of Yellow Cards given to Chelsea and have the sum grouped by Season.
Similarly I want to swap the Yellow Card details to find out the total Fouls to Chelsea / Season.
I'm guessing the line of code will be the same and only need to swap the column name in a different cell?
Because I want to do the same by swapping the Team Name too.
What I hope to achieve is: (For Chelsea)
Season YellowCards
2000-01 5
2002-03 3
2006-07 5
And similarly
Season Fouls
2000-01 51
2002-03 33
2006-07 22
答案1
得分: 3
使用concat
函数,同时重命名Home
和Away
列,并聚合sum
:
epl = pd.DataFrame({'Season': {0: '2000-01', 1: '2000-01', 2: '2000-01', 3: '2002-03', 4: '2002-03', 5: '2006-07', 6: '2006-07'}, 'HomeTeam': {0: 'Chelsea', 1: 'Bradford', 2: 'Chelsea', 3: 'Chelsea', 4: 'N.castle', 5: 'Watford', 6: 'Chelsea'}, 'AwayTeam': {0: 'West Ham', 1: 'Chelsea', 2: 'Arsenal', 3: 'Blackburn', 4: 'Chelsea', 5: 'Chelsea', 6: 'Tottenham'}, 'HomeYellow': {0: 1, 1: 0, 2: 3, 3: 2, 4: 0, 5: 1, 6: 2}, 'AwayYellow': {0: 2, 1: 1, 2: 1, 3: 1, 4: 1, 5: 3, 6: 1}, 'HomeFoul': {0: 19, 1: 14, 2: 16, 3: 11, 4: 12, 5: 18, 6: 13}, 'AwayFoul': {0: 14, 1: 16, 2: 22, 3: 12, 4: 22, 5: 9, 6: 11}})
print (epl)
Season HomeTeam AwayTeam HomeYellow AwayYellow HomeFoul AwayFoul
0 2000-01 Chelsea West Ham 1 2 19 14
1 2000-01 Bradford Chelsea 0 1 14 16
2 2000-01 Chelsea Arsenal 3 1 16 22
3 2002-03 Chelsea Blackburn 2 1 11 12
4 2002-03 N.castle Chelsea 0 1 12 22
5 2006-07 Watford Chelsea 1 3 18 9
6 2006-07 Chelsea Tottenham 2 1 13 11
out = (pd.concat([epl.rename(columns={'HomeTeam':'Team',
'HomeYellow':'YellowCards',
'HomeFoul':'Fouls'}),
epl.rename(columns={'AwayTeam':'Team',
'AwayYellow':'YellowCards',
'AwayFoul':'Fouls'})])
.groupby(['Season','Team'], as_index=False)[['YellowCards','Fouls']]
.sum())
print (out)
Season Team YellowCards Fouls
0 2000-01 Arsenal 1 22
1 2000-01 Bradford 0 14
2 2000-01 Chelsea 5 51
3 2000-01 West Ham 2 14
4 2002-03 Blackburn 1 12
5 2002-03 Chelsea 3 33
6 2002-03 N.castle 0 12
7 2006-07 Chelsea 5 22
8 2006-07 Tottenham 1 11
9 2006-07 Watford 1 18
如果需要分别查看队伍数据:
chelsea1 = out.loc[out['Team'].eq('Chelsea'), ['Season','YellowCards']]
print (chelsea1)
Season YellowCards
2 2000-01 5
5 2002-03 3
7 2006-07 5
chelsea2 = out.loc[out['Team'].eq('Chelsea'), ['Season','Fouls']]
print (chelsea2)
Season Fouls
2 2000-01 51
5 2002-03 33
7 2006-07 22
编辑:使用您的数据进行测试:
df = pd.read_csv('results.csv')
# 如果需要编码处理
# df = pd.read_csv('results.csv', encoding="ISO-8859-1")
epl = df.dropna().reset_index(drop=True)
# print (epl)
out = (pd.concat([epl.rename(columns={'HomeTeam':'Team',
'HY':'YellowCards',
'HF':'Fouls'}),
epl.rename(columns={'AwayTeam':'Team',
'AY':'YellowCards',
'AF':'Fouls'})])
.groupby(['Season','Team'])[['YellowCards','Fouls']]
.sum()
.astype(int)
.reset_index())
print (out)
查看切尔西队的数据:
chelsea1 = out.loc[out['Team'].eq('Chelsea'), ['Season','YellowCards']]
print (chelsea1)
chelsea2 = out.loc[out['Team'].eq('Chelsea'), ['Season','Fouls']]
print (chelsea2)
请注意,这只是提供给您的翻译,代码部分没有翻译。
英文:
Use concat
with rename Home
and Away
columns with aggregate sum
:
epl = pd.DataFrame({'Season': {0: '2000-01', 1: '2000-01', 2: '2000-01', 3: '2002-03', 4: '2002-03', 5: '2006-07', 6: '2006-07'}, 'HomeTeam': {0: 'Chelsea', 1: 'Bradford', 2: 'Chelsea', 3: 'Chelsea', 4: 'N.castle', 5: 'Watford', 6: 'Chelsea'}, 'AwayTeam': {0: 'West Ham', 1: 'Chelsea', 2: 'Arsenal', 3: 'Blackburn', 4: 'Chelsea', 5: 'Chelsea', 6: 'Tottenham'}, 'HomeYellow': {0: 1, 1: 0, 2: 3, 3: 2, 4: 0, 5: 1, 6: 2}, 'AwayYellow': {0: 2, 1: 1, 2: 1, 3: 1, 4: 1, 5: 3, 6: 1}, 'HomeFoul': {0: 19, 1: 14, 2: 16, 3: 11, 4: 12, 5: 18, 6: 13}, 'AwayFoul': {0: 14, 1: 16, 2: 22, 3: 12, 4: 22, 5: 9, 6: 11}})
print (epl)
Season HomeTeam AwayTeam HomeYellow AwayYellow HomeFoul AwayFoul
0 2000-01 Chelsea West Ham 1 2 19 14
1 2000-01 Bradford Chelsea 0 1 14 16
2 2000-01 Chelsea Arsenal 3 1 16 22
3 2002-03 Chelsea Blackburn 2 1 11 12
4 2002-03 N.castle Chelsea 0 1 12 22
5 2006-07 Watford Chelsea 1 3 18 9
6 2006-07 Chelsea Tottenham 2 1 13 11
out = (pd.concat([epl.rename(columns={'HomeTeam':'Team',
'HomeYellow':'YellowCards',
'HomeFoul':'Fouls'}),
epl.rename(columns={'AwayTeam':'Team',
'AwayYellow':'YellowCards',
'AwayFoul':'Fouls'})])
.groupby(['Season','Team'], as_index=False)[['YellowCards','Fouls']]
.sum())
print (out)
Season Team YellowCards Fouls
0 2000-01 Arsenal 1 22
1 2000-01 Bradford 0 14
2 2000-01 Chelsea 5 51
3 2000-01 West Ham 2 14
4 2002-03 Blackburn 1 12
5 2002-03 Chelsea 3 33
6 2002-03 N.castle 0 12
7 2006-07 Chelsea 5 22
8 2006-07 Tottenham 1 11
9 2006-07 Watford 1 18
If need separately team:
chelsea1 = out.loc[out['Team'].eq('Chelsea'), ['Season','YellowCards']]
print (chelsea1)
Season YellowCards
2 2000-01 5
5 2002-03 3
7 2006-07 5
chelsea2 = out.loc[out['Team'].eq('Chelsea'), ['Season','Fouls']]
print (chelsea2)
Season Fouls
2 2000-01 51
5 2002-03 33
7 2006-07 22
EDIT: Testing with your data:
df = pd.read_csv('results.csv')
#if neccessary
#df = pd.read_csv('results.csv', encoding = "ISO-8859-1")
epl=df.dropna().reset_index(drop=True)
# print (epl)
out = (pd.concat([epl.rename(columns={'HomeTeam':'Team',
'HY':'YellowCards',
'HF':'Fouls'}),
epl.rename(columns={'AwayTeam':'Team',
'AY':'YellowCards',
'AF':'Fouls'})])
.groupby(['Season','Team'])[['YellowCards','Fouls']]
.sum()
.astype(int)
.reset_index())
print (out)
Season Team YellowCards Fouls
0 2000-01 Arsenal 48 495
1 2000-01 Aston Villa 66 491
2 2000-01 Bradford 57 517
3 2000-01 Charlton 44 467
4 2000-01 Chelsea 75 545
.. ... ... ... ...
435 2021-22 Southampton 54 349
436 2021-22 Tottenham 54 317
437 2021-22 Watford 52 373
438 2021-22 West Ham 38 273
439 2021-22 Wolves 53 331
[440 rows x 4 columns]
chelsea1 = out.loc[out['Team'].eq('Chelsea'), ['Season','YellowCards']]
print (chelsea1)
Season YellowCards
4 2000-01 75
25 2001-02 70
46 2002-03 63
66 2003-04 45
86 2004-05 53
106 2005-06 63
125 2006-07 64
145 2007-08 65
164 2008-09 50
186 2009-10 56
206 2010-11 59
224 2011-12 75
242 2012-13 52
263 2013-14 56
283 2014-15 77
303 2015-16 58
323 2016-17 72
344 2017-18 41
365 2018-19 49
385 2019-20 60
404 2020-21 48
425 2021-22 48
chelsea2 = out.loc[out['Team'].eq('Chelsea'), ['Season','Fouls']]
print (chelsea2)
Season Fouls
4 2000-01 545
25 2001-02 496
46 2002-03 445
66 2003-04 442
86 2004-05 474
106 2005-06 434
125 2006-07 436
145 2007-08 432
164 2008-09 417
186 2009-10 441
206 2010-11 405
224 2011-12 403
242 2012-13 371
263 2013-14 420
283 2014-15 382
303 2015-16 408
323 2016-17 394
344 2017-18 363
365 2018-19 339
385 2019-20 386
404 2020-21 433
425 2021-22 324
答案2
得分: 2
使用pd.wide_to_long
来解除数据帧的脱离形式,有时候对于未来的分析会很有好处:
out = (pd.wide_to_long(df.rename_axis('MatchId').reset_index(),
stubnames=('Home', 'Away'),
i=['MatchId', 'Season'], j='Variable', suffix='\w+')
.rename_axis(columns='HomeAway').stack()
.unstack('Variable').rename_axis(columns=None).reset_index())
print(out)
# 输出
MatchId Season HomeAway Foul Team Yellow
0 0 2000-01 Home 19 Chelsea 1
1 0 2000-01 Away 14 West Ham 2
2 1 2000-01 Home 14 Bradford 0
3 1 2000-01 Away 16 Chelsea 1
4 2 2000-01 Home 16 Chelsea 3
5 2 2000-01 Away 22 Arsenal 1
6 3 2002-03 Home 11 Chelsea 2
7 3 2002-03 Away 12 Blackburn 1
8 4 2002-03 Home 12 N.castle 0
9 4 2002-03 Away 22 Chelsea 1
10 5 2006-07 Home 18 Watford 1
11 5 2006-07 Away 9 Chelsea 3
12 6 2006-07 Home 13 Chelsea 2
13 6 2006-07 Away 11 Tottenham 1
之后:
>>> out.groupby(['Season', 'Team'], as_index=False)[['Foul', 'Yellow']].sum()
Season Team Foul Yellow
0 2000-01 Arsenal 22 1
1 2000-01 Bradford 14 0
2 2000-01 Chelsea 51 5
3 2000-01 West Ham 14 2
4 2002-03 Blackburn 12 1
5 2002-03 Chelsea 33 3
6 2002-03 N.castle 12 0
7 2006-07 Chelsea 22 5
8 2006-07 Tottenham 11 1
9 2006-07 Watford 18 1
对于切尔西队:
# 单独
>>> out[out['Team'] == 'Chelsea'].groupby('Season', as_index=False)['YellowCards'].sum()
Season YellowCards
0 2000-01 5
1 2002-03 3
2 2006-07 5
# 单独
>>> out[out['Team'] == 'Chelsea'].groupby('Season', as_index=False)['Fouls'].sum()
Season Fouls
0 2000-01 51
1 2002-03 33
2 2006-07 22
# 综合
>>> out[out['Team'] == 'Chelsea'].groupby('Season', as_index=False)[['YellowCards', 'Fouls']].sum()
Season YellowCards Fouls
0 2000-01 5 51
1 2002-03 3 33
2 2006-07 5 22
方法仍然相同,只需在groupby_sum
之前筛选数据帧,因此可以使用以下函数:
def search(df, team, variable):
return df[df['Team'] == team].groupby('Season', as_index=False)[variable].sum()
print(search(out, 'Chelsea', 'Fouls'))
# 输出
Season Fouls
0 2000-01 51
1 2002-03 33
2 2006-07 22
英文:
Sometimes it could be good to change the shape of our dataframe for future analysis. Use pd.wide_to_long
to unpivot your dataframe:
out = (pd.wide_to_long(df.rename_axis('MatchId').reset_index(),
stubnames=('Home', 'Away'),
i=['MatchId', 'Season'], j='Variable', suffix='\w+')
.rename_axis(columns='HomeAway').stack()
.unstack('Variable').rename_axis(columns=None).reset_index())
print(out)
# Out
MatchId Season HomeAway Foul Team Yellow
0 0 2000-01 Home 19 Chelsea 1
1 0 2000-01 Away 14 West Ham 2
2 1 2000-01 Home 14 Bradford 0
3 1 2000-01 Away 16 Chelsea 1
4 2 2000-01 Home 16 Chelsea 3
5 2 2000-01 Away 22 Arsenal 1
6 3 2002-03 Home 11 Chelsea 2
7 3 2002-03 Away 12 Blackburn 1
8 4 2002-03 Home 12 N.castle 0
9 4 2002-03 Away 22 Chelsea 1
10 5 2006-07 Home 18 Watford 1
11 5 2006-07 Away 9 Chelsea 3
12 6 2006-07 Home 13 Chelsea 2
13 6 2006-07 Away 11 Tottenham 1
After that:
>>> out.groupby(['Season', 'Team'], as_index=False)[['Foul', 'Yellow']].sum()
Season Team Foul Yellow
0 2000-01 Arsenal 22 1
1 2000-01 Bradford 14 0
2 2000-01 Chelsea 51 5
3 2000-01 West Ham 14 2
4 2002-03 Blackburn 12 1
5 2002-03 Chelsea 33 3
6 2002-03 N.castle 12 0
7 2006-07 Chelsea 22 5
8 2006-07 Tottenham 11 1
9 2006-07 Watford 18 1
For Chelsea:
# Individual
>>> out[out['Team'] == 'Chelsea'].groupby('Season', as_index=False)['YellowCards'].sum()
Season YellowCards
0 2000-01 5
1 2002-03 3
2 2006-07 5
# Individual
>>> out[out['Team'] == 'Chelsea'].groupby('Season', as_index=False)['Fouls'].sum()
Season Fouls
0 2000-01 51
1 2002-03 33
2 2006-07 22
# Together
>>> out[out['Team'] == 'Chelsea'].groupby('Season', as_index=False)[['YellowCards', 'Fouls']].sum()
Season YellowCards Fouls
0 2000-01 5 51
1 2002-03 3 33
2 2006-07 5 22
The method still the same, you just have to filter the dataframe before the groupby_sum
, so you can use a function:
def search(df, team, variable):
return df[df['Team'] == team].groupby('Season', as_index=False)[variable].sum()
print(search(out, 'Chelsea', 'Fouls'))
# Output
Season Fouls
0 2000-01 51
1 2002-03 33
2 2006-07 22
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论