在Pandas中,基于列A和B中出现的唯一值,计算多列C和D的值之和。

huangapple go评论74阅读模式
英文:

Sum of values from multiple column, C & D based on a unique value appearing in column A & B in Pandas

问题

From a .csv file, I have columns and a few data:

Season   HomeTeam   AwayTeam   HomeYellow   AwayYellow   HomeFoul   AwayFoul
2000-01  Chelsea    West Ham       1             2          19         14
2000-01  Bradford   Chelsea        0             1          14         16
2000-01  Chelsea    Arsenal        3             1          16         22
.
.
2002-03  Chelsea    Blackburn      2             1          11         12
2002-03  N.castle   Chelsea        0             1          12         22
.
.
2006-07  Watford    Chelsea        1             3          18         9
2006-07  Chelsea    Tottenham      2             1          13         11

I want to find out the tally of Yellow Cards given to Chelsea and have the sum grouped by Season.
Similarly I want to swap the Yellow Card details to find out the total Fouls to Chelsea / Season.

I'm guessing the line of code will be the same and only need to swap the column name in a different cell?
Because I want to do the same by swapping the Team Name too.

What I hope to achieve is: (For Chelsea)

Season  YellowCards

2000-01      5
2002-03      3
2006-07      5

And similarly

Season   Fouls

2000-01   51
2002-03   33
2006-07   22
英文:

From a .csv file, I have columns and a few data:

Season   HomeTeam   AwayTeam   HomeYellow   AwayYellow   HomeFoul   AwayFoul
2000-01  Chelsea    West Ham       1             2          19         14
2000-01  Bradford   Chelsea        0             1          14         16
2000-01  Chelsea    Arsenal        3             1          16         22
.
.
2002-03  Chelsea    Blackburn      2             1          11         12
2002-03  N.castle   Chelsea        0             1          12         22
.
.
2006-07  Watford    Chelsea        1             3          18         9
2006-07  Chelsea    Tottenham      2             1          13         11

I want to find out the tally of Yellow Cards given to Chelsea and have the sum grouped by Season.
Similarly I want to swap the Yellow Card details to find out the total Fouls to Chelsea / Season.

I'm guessing the line of code will be the same and only need to swap the column name in a different cell?
Because I want to do the same by swapping the Team Name too.

What I hope to achieve is: (For Chelsea)

Season  YellowCards

2000-01      5
2002-03      3
2006-07      5

And similarly

Season   Fouls

2000-01   51  
2002-03   33  
2006-07   22

答案1

得分: 3

使用concat函数,同时重命名HomeAway列,并聚合sum

epl = pd.DataFrame({'Season': {0: '2000-01', 1: '2000-01', 2: '2000-01', 3: '2002-03', 4: '2002-03', 5: '2006-07', 6: '2006-07'}, 'HomeTeam': {0: 'Chelsea', 1: 'Bradford', 2: 'Chelsea', 3: 'Chelsea', 4: 'N.castle', 5: 'Watford', 6: 'Chelsea'}, 'AwayTeam': {0: 'West Ham', 1: 'Chelsea', 2: 'Arsenal', 3: 'Blackburn', 4: 'Chelsea', 5: 'Chelsea', 6: 'Tottenham'}, 'HomeYellow': {0: 1, 1: 0, 2: 3, 3: 2, 4: 0, 5: 1, 6: 2}, 'AwayYellow': {0: 2, 1: 1, 2: 1, 3: 1, 4: 1, 5: 3, 6: 1}, 'HomeFoul': {0: 19, 1: 14, 2: 16, 3: 11, 4: 12, 5: 18, 6: 13}, 'AwayFoul': {0: 14, 1: 16, 2: 22, 3: 12, 4: 22, 5: 9, 6: 11}})
print (epl)
    Season  HomeTeam   AwayTeam  HomeYellow  AwayYellow  HomeFoul  AwayFoul
0  2000-01   Chelsea   West Ham           1           2        19        14
1  2000-01  Bradford    Chelsea           0           1        14        16
2  2000-01   Chelsea    Arsenal           3           1        16        22
3  2002-03   Chelsea  Blackburn           2           1        11        12
4  2002-03  N.castle    Chelsea           0           1        12        22
5  2006-07   Watford    Chelsea           1           3        18         9
6  2006-07   Chelsea  Tottenham           2           1        13        11
out = (pd.concat([epl.rename(columns={'HomeTeam':'Team',
                                     'HomeYellow':'YellowCards',
                                     'HomeFoul':'Fouls'}),
                  epl.rename(columns={'AwayTeam':'Team',
                                     'AwayYellow':'YellowCards',
                                     'AwayFoul':'Fouls'})])
         .groupby(['Season','Team'], as_index=False)[['YellowCards','Fouls']]
         .sum())
print (out)
    Season       Team  YellowCards  Fouls
0  2000-01    Arsenal            1     22
1  2000-01   Bradford            0     14
2  2000-01    Chelsea            5     51
3  2000-01   West Ham            2     14
4  2002-03  Blackburn            1     12
5  2002-03    Chelsea            3     33
6  2002-03   N.castle            0     12
7  2006-07    Chelsea            5     22
8  2006-07  Tottenham            1     11
9  2006-07    Watford            1     18

如果需要分别查看队伍数据:

chelsea1 = out.loc[out['Team'].eq('Chelsea'), ['Season','YellowCards']]
print (chelsea1)
    Season  YellowCards
2  2000-01            5
5  2002-03            3
7  2006-07            5

chelsea2 = out.loc[out['Team'].eq('Chelsea'), ['Season','Fouls']]
print (chelsea2)
    Season  Fouls
2  2000-01     51
5  2002-03     33
7  2006-07     22

编辑:使用您的数据进行测试:

df = pd.read_csv('results.csv')
# 如果需要编码处理
# df = pd.read_csv('results.csv', encoding="ISO-8859-1")

epl = df.dropna().reset_index(drop=True)
# print (epl)

out = (pd.concat([epl.rename(columns={'HomeTeam':'Team',
                                     'HY':'YellowCards',
                                     'HF':'Fouls'}),
                  epl.rename(columns={'AwayTeam':'Team',
                                     'AY':'YellowCards',
                                     'AF':'Fouls'})])
         .groupby(['Season','Team'])[['YellowCards','Fouls']]
         .sum()
         .astype(int)
         .reset_index())
print (out)

查看切尔西队的数据:

chelsea1 = out.loc[out['Team'].eq('Chelsea'), ['Season','YellowCards']]
print (chelsea1)
chelsea2 = out.loc[out['Team'].eq('Chelsea'), ['Season','Fouls']]
print (chelsea2)

请注意,这只是提供给您的翻译,代码部分没有翻译。

英文:

Use concat with rename Home and Away columns with aggregate sum:

epl = pd.DataFrame({'Season': {0: '2000-01', 1: '2000-01', 2: '2000-01', 3: '2002-03', 4: '2002-03', 5: '2006-07', 6: '2006-07'}, 'HomeTeam': {0: 'Chelsea', 1: 'Bradford', 2: 'Chelsea', 3: 'Chelsea', 4: 'N.castle', 5: 'Watford', 6: 'Chelsea'}, 'AwayTeam': {0: 'West Ham', 1: 'Chelsea', 2: 'Arsenal', 3: 'Blackburn', 4: 'Chelsea', 5: 'Chelsea', 6: 'Tottenham'}, 'HomeYellow': {0: 1, 1: 0, 2: 3, 3: 2, 4: 0, 5: 1, 6: 2}, 'AwayYellow': {0: 2, 1: 1, 2: 1, 3: 1, 4: 1, 5: 3, 6: 1}, 'HomeFoul': {0: 19, 1: 14, 2: 16, 3: 11, 4: 12, 5: 18, 6: 13}, 'AwayFoul': {0: 14, 1: 16, 2: 22, 3: 12, 4: 22, 5: 9, 6: 11}})
print (epl)
    Season  HomeTeam   AwayTeam  HomeYellow  AwayYellow  HomeFoul  AwayFoul
0  2000-01   Chelsea   West Ham           1           2        19        14
1  2000-01  Bradford    Chelsea           0           1        14        16
2  2000-01   Chelsea    Arsenal           3           1        16        22
3  2002-03   Chelsea  Blackburn           2           1        11        12
4  2002-03  N.castle    Chelsea           0           1        12        22
5  2006-07   Watford    Chelsea           1           3        18         9
6  2006-07   Chelsea  Tottenham           2           1        13        11

out = (pd.concat([epl.rename(columns={'HomeTeam':'Team',
                                     'HomeYellow':'YellowCards',
                                     'HomeFoul':'Fouls'}),
                  epl.rename(columns={'AwayTeam':'Team',
                                     'AwayYellow':'YellowCards',
                                     'AwayFoul':'Fouls'})])
         .groupby(['Season','Team'], as_index=False)[['YellowCards','Fouls']]
         .sum())
print (out)
    Season       Team  YellowCards  Fouls
0  2000-01    Arsenal            1     22
1  2000-01   Bradford            0     14
2  2000-01    Chelsea            5     51
3  2000-01   West Ham            2     14
4  2002-03  Blackburn            1     12
5  2002-03    Chelsea            3     33
6  2002-03   N.castle            0     12
7  2006-07    Chelsea            5     22
8  2006-07  Tottenham            1     11
9  2006-07    Watford            1     18

If need separately team:

chelsea1 = out.loc[out['Team'].eq('Chelsea'), ['Season','YellowCards']]
print (chelsea1)
    Season  YellowCards
2  2000-01            5
5  2002-03            3
7  2006-07            5

chelsea2 = out.loc[out['Team'].eq('Chelsea'), ['Season','Fouls']]
print (chelsea2)
    Season  Fouls
2  2000-01     51
5  2002-03     33
7  2006-07     22

EDIT: Testing with your data:

df = pd.read_csv('results.csv')
#if neccessary
#df = pd.read_csv('results.csv', encoding = "ISO-8859-1")

epl=df.dropna().reset_index(drop=True)
# print (epl)

out = (pd.concat([epl.rename(columns={'HomeTeam':'Team',
                                     'HY':'YellowCards',
                                     'HF':'Fouls'}),
                  epl.rename(columns={'AwayTeam':'Team',
                                     'AY':'YellowCards',
                                     'AF':'Fouls'})])
         .groupby(['Season','Team'])[['YellowCards','Fouls']]
         .sum()
         .astype(int)
         .reset_index())
print (out)
      Season         Team  YellowCards  Fouls
0    2000-01      Arsenal           48    495
1    2000-01  Aston Villa           66    491
2    2000-01     Bradford           57    517
3    2000-01     Charlton           44    467
4    2000-01      Chelsea           75    545
..       ...          ...          ...    ...
435  2021-22  Southampton           54    349
436  2021-22    Tottenham           54    317
437  2021-22      Watford           52    373
438  2021-22     West Ham           38    273
439  2021-22       Wolves           53    331

[440 rows x 4 columns]

chelsea1 = out.loc[out['Team'].eq('Chelsea'), ['Season','YellowCards']]
print (chelsea1)
      Season  YellowCards
4    2000-01           75
25   2001-02           70
46   2002-03           63
66   2003-04           45
86   2004-05           53
106  2005-06           63
125  2006-07           64
145  2007-08           65
164  2008-09           50
186  2009-10           56
206  2010-11           59
224  2011-12           75
242  2012-13           52
263  2013-14           56
283  2014-15           77
303  2015-16           58
323  2016-17           72
344  2017-18           41
365  2018-19           49
385  2019-20           60
404  2020-21           48
425  2021-22           48

chelsea2 = out.loc[out['Team'].eq('Chelsea'), ['Season','Fouls']]
print (chelsea2)
      Season  Fouls
4    2000-01    545
25   2001-02    496
46   2002-03    445
66   2003-04    442
86   2004-05    474
106  2005-06    434
125  2006-07    436
145  2007-08    432
164  2008-09    417
186  2009-10    441
206  2010-11    405
224  2011-12    403
242  2012-13    371
263  2013-14    420
283  2014-15    382
303  2015-16    408
323  2016-17    394
344  2017-18    363
365  2018-19    339
385  2019-20    386
404  2020-21    433
425  2021-22    324

答案2

得分: 2

使用pd.wide_to_long来解除数据帧的脱离形式,有时候对于未来的分析会很有好处:

out = (pd.wide_to_long(df.rename_axis('MatchId').reset_index(), 
                       stubnames=('Home', 'Away'),
                       i=['MatchId', 'Season'], j='Variable', suffix='\w+')
         .rename_axis(columns='HomeAway').stack()
         .unstack('Variable').rename_axis(columns=None).reset_index())
print(out)

# 输出
    MatchId   Season HomeAway Foul       Team Yellow
0         0  2000-01     Home   19    Chelsea      1
1         0  2000-01     Away   14   West Ham      2
2         1  2000-01     Home   14   Bradford      0
3         1  2000-01     Away   16    Chelsea      1
4         2  2000-01     Home   16    Chelsea      3
5         2  2000-01     Away   22    Arsenal      1
6         3  2002-03     Home   11    Chelsea      2
7         3  2002-03     Away   12  Blackburn      1
8         4  2002-03     Home   12   N.castle      0
9         4  2002-03     Away   22    Chelsea      1
10        5  2006-07     Home   18    Watford      1
11        5  2006-07     Away    9    Chelsea      3
12        6  2006-07     Home   13    Chelsea      2
13        6  2006-07     Away   11  Tottenham      1

之后:

>>> out.groupby(['Season', 'Team'], as_index=False)[['Foul', 'Yellow']].sum()
    Season       Team Foul Yellow
0  2000-01    Arsenal   22      1
1  2000-01   Bradford   14      0
2  2000-01    Chelsea   51      5
3  2000-01   West Ham   14      2
4  2002-03  Blackburn   12      1
5  2002-03    Chelsea   33      3
6  2002-03   N.castle   12      0
7  2006-07    Chelsea   22      5
8  2006-07  Tottenham   11      1
9  2006-07    Watford   18      1

对于切尔西队:

# 单独
>>> out[out['Team'] == 'Chelsea'].groupby('Season', as_index=False)['YellowCards'].sum()
    Season  YellowCards
0  2000-01            5
1  2002-03            3
2  2006-07            5

# 单独
>>> out[out['Team'] == 'Chelsea'].groupby('Season', as_index=False)['Fouls'].sum()
    Season  Fouls
0  2000-01     51
1  2002-03     33
2  2006-07     22

# 综合
>>> out[out['Team'] == 'Chelsea'].groupby('Season', as_index=False)[['YellowCards', 'Fouls']].sum()
    Season  YellowCards  Fouls
0  2000-01            5     51
1  2002-03            3     33
2  2006-07            5     22

方法仍然相同,只需在groupby_sum之前筛选数据帧,因此可以使用以下函数:

def search(df, team, variable):
    return df[df['Team'] == team].groupby('Season', as_index=False)[variable].sum()

print(search(out, 'Chelsea', 'Fouls'))

# 输出
    Season  Fouls
0  2000-01     51
1  2002-03     33
2  2006-07     22
英文:

Sometimes it could be good to change the shape of our dataframe for future analysis. Use pd.wide_to_long to unpivot your dataframe:

out = (pd.wide_to_long(df.rename_axis('MatchId').reset_index(), 
                       stubnames=('Home', 'Away'),
                       i=['MatchId', 'Season'], j='Variable', suffix='\w+')
         .rename_axis(columns='HomeAway').stack()
         .unstack('Variable').rename_axis(columns=None).reset_index())
print(out)

# Out
    MatchId   Season HomeAway Foul       Team Yellow
0         0  2000-01     Home   19    Chelsea      1
1         0  2000-01     Away   14   West Ham      2
2         1  2000-01     Home   14   Bradford      0
3         1  2000-01     Away   16    Chelsea      1
4         2  2000-01     Home   16    Chelsea      3
5         2  2000-01     Away   22    Arsenal      1
6         3  2002-03     Home   11    Chelsea      2
7         3  2002-03     Away   12  Blackburn      1
8         4  2002-03     Home   12   N.castle      0
9         4  2002-03     Away   22    Chelsea      1
10        5  2006-07     Home   18    Watford      1
11        5  2006-07     Away    9    Chelsea      3
12        6  2006-07     Home   13    Chelsea      2
13        6  2006-07     Away   11  Tottenham      1

After that:

>>> out.groupby(['Season', 'Team'], as_index=False)[['Foul', 'Yellow']].sum()
    Season       Team Foul Yellow
0  2000-01    Arsenal   22      1
1  2000-01   Bradford   14      0
2  2000-01    Chelsea   51      5
3  2000-01   West Ham   14      2
4  2002-03  Blackburn   12      1
5  2002-03    Chelsea   33      3
6  2002-03   N.castle   12      0
7  2006-07    Chelsea   22      5
8  2006-07  Tottenham   11      1
9  2006-07    Watford   18      1

For Chelsea:

# Individual
>>> out[out['Team'] == 'Chelsea'].groupby('Season', as_index=False)['YellowCards'].sum()
    Season  YellowCards
0  2000-01            5
1  2002-03            3
2  2006-07            5

# Individual
>>> out[out['Team'] == 'Chelsea'].groupby('Season', as_index=False)['Fouls'].sum()
    Season  Fouls
0  2000-01     51
1  2002-03     33
2  2006-07     22

# Together
>>> out[out['Team'] == 'Chelsea'].groupby('Season', as_index=False)[['YellowCards', 'Fouls']].sum()
    Season  YellowCards  Fouls
0  2000-01            5     51
1  2002-03            3     33
2  2006-07            5     22

The method still the same, you just have to filter the dataframe before the groupby_sum, so you can use a function:

def search(df, team, variable):
    return df[df['Team'] == team].groupby('Season', as_index=False)[variable].sum()

print(search(out, 'Chelsea', 'Fouls'))

# Output
    Season  Fouls
0  2000-01     51
1  2002-03     33
2  2006-07     22

huangapple
  • 本文由 发表于 2023年5月10日 14:23:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76215421.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定