There is a Python function that allows me to sum the last goals of a team in a dataframe.

huangapple go评论85阅读模式
英文:

Is there a Python function that allows me to sum the last goals of a team in a dataframe?

问题

I'm solo learning football data science, and want to analyse some things in the Turkish league. Now I'm trying to make some analyses from the past 5 games of both teams from the upcoming games. First I've already made a dataframe with the games of the season. Now I want to make an indicator of points for goals from each team, but since the start of the season. The idea is: I find how many goals the home team scored in the past 5 matches, where goals scored at home count as 1 point and goals scored away count as 2 points. I do the same for the away team and for the conceded goals. At the moment, when I filter the last 5 home team games, I only get their last 5 at home, and not the last five overall.

这是我的代码:

import pandas as pd

webs = pd.read_csv('https://www.football-data.co.uk/mmz4281/2223/T1.csv')
tk = webs[['Date', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG', 'FTR']]
tk.columns = ['Data', 'Home', 'Away', 'FT_Gols_H', 'FT_Gols_A', 'FT_Result']

tk['Goal_Points'] = tk.groupby('Home')['FT_Gols_H'].rolling(window=5, min_periods=1).sum().reset_index(0, drop=True).shift(1)

Is there anything else you'd like to know or clarify?

英文:

I'm solo learning football data science, and want to analyse some things in the turkish league. Now I'm trying to make some analyses from the past 5 games of both teams from the upcoming games. First I've already made a dataframe with the games of the season. Now I want to make a indicator of points of goals from each team, but since of the start from the season. The ideia is: I find how many goals the home team scored in the past 5 matches, goals scored at home count as 1 points and goals scored away count as 2 points. I make the same for the away team and for the conceded goals. At the moment, when I filter the last 5 home team games, I only get his last 5 at home, and not the last five overall.

This is my code so far

import pandas as pd

webs = pd.read_csv('https://www.football-data.co.uk/mmz4281/2223/T1.csv')
tk = webs[['Date','HomeTeam','AwayTeam','FTHG','FTAG','FTR']]
tk.columns = ['Data','Home','Away','FT_Gols_H','FT_Gols_A','FT_Result']

tk['Goal_Points'] = tk.groupby('Home')['FT_Gols_H'].rolling(window=5, min_periods=1).sum().reset_index(0,drop=True).shift(1)

答案1

得分: 0

我建议稍微调整你的数据格式。将数据框转换为包含3列的形式:日期、球队和该日期的进球数,无论他们是主场还是客场。

例子:

home_team_games = tk[['日期', '主队', '主队进球数']].rename(columns={
    '主队': '球队',
    '主队进球数': '进球数',
})
away_team_games = tk[['日期', '客队', '客队进球数']].rename(columns={
    '客队': '球队',
    '客队进球数': '进球数',
})
tk = pd.concat([home_team_games, away_team_games])
tk = tk.sort_values('日期')

然后你会得到这样的数据框:

           日期          球队  进球数
193  01/02/2023   安塔利亚斯堡   0
192  01/02/2023  乌姆拉尼耶斯堡   2
191  01/02/2023    吉雷松斯波尔   1
190  01/02/2023       哈塔伊   1
193  01/02/2023   特拉布宗斯波尔   2
..          ...         ...  ...
189  31/01/2023      卡拉古姆鲁克   1
107  31/10/2022   安塔利亚斯堡   2
107  31/10/2022        锡瓦斯   0
106  31/10/2022    吉雷松斯波尔   1
106  31/10/2022      布尤克塞希尔   3

我稍微修改了你的代码,使其适用于这个数据框。我更改了列名,并且必须放弃索引的两个级别而不仅仅是第一个级别。

tk['前5场进球数'] = tk.groupby('球队')['进球数'].rolling(window=5, min_periods=1).sum().reset_index(drop=True).shift(1)
英文:

I'd suggest reformatting your data a little. Make the dataframe into one with 3 columns: the date, the team, and the goals they got on that date, no matter whether they were at home or away.

Example:

home_team_games = tk[['Data', 'Home', 'FT_Gols_H']].rename(columns={
    'Home': 'Team',
    'FT_Gols_H': 'Goals',
})
away_team_games = tk[['Data', 'Away', 'FT_Gols_A']].rename(columns={
    'Away': 'Team',
    'FT_Gols_A': 'Goals',
})
tk = pd.concat([home_team_games, away_team_games])
tk = tk.sort_values('Data')

Then you have a dataframe like this:

           Data          Team  Goals
193  01/02/2023   Antalyaspor      0
192  01/02/2023  Umraniyespor      2
191  01/02/2023   Giresunspor      1
190  01/02/2023     Hatayspor      1
193  01/02/2023   Trabzonspor      2
..          ...           ...    ...
189  31/01/2023    Karagumruk      1
107  31/10/2022   Antalyaspor      2
107  31/10/2022     Sivasspor      0
106  31/10/2022   Giresunspor      1
106  31/10/2022    Buyuksehyr      3

I had to modify your code a little bit to get it to work on this data frame. I changed the name of the columns, and I also had to drop both levels of the index and not just the first level.

tk['Goals_Prev_5_Games'] = tk.groupby('Team')['Goals'].rolling(window=5, min_periods=1).sum().reset_index(drop=True).shift(1)

huangapple
  • 本文由 发表于 2023年5月7日 05:08:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76191151.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定