英文:
Is there a Python function that allows me to sum the last goals of a team in a dataframe?
问题
I'm solo learning football data science, and want to analyse some things in the Turkish league. Now I'm trying to make some analyses from the past 5 games of both teams from the upcoming games. First I've already made a dataframe with the games of the season. Now I want to make an indicator of points for goals from each team, but since the start of the season. The idea is: I find how many goals the home team scored in the past 5 matches, where goals scored at home count as 1 point and goals scored away count as 2 points. I do the same for the away team and for the conceded goals. At the moment, when I filter the last 5 home team games, I only get their last 5 at home, and not the last five overall.
这是我的代码:
import pandas as pd
webs = pd.read_csv('https://www.football-data.co.uk/mmz4281/2223/T1.csv')
tk = webs[['Date', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG', 'FTR']]
tk.columns = ['Data', 'Home', 'Away', 'FT_Gols_H', 'FT_Gols_A', 'FT_Result']
tk['Goal_Points'] = tk.groupby('Home')['FT_Gols_H'].rolling(window=5, min_periods=1).sum().reset_index(0, drop=True).shift(1)
Is there anything else you'd like to know or clarify?
英文:
I'm solo learning football data science, and want to analyse some things in the turkish league. Now I'm trying to make some analyses from the past 5 games of both teams from the upcoming games. First I've already made a dataframe with the games of the season. Now I want to make a indicator of points of goals from each team, but since of the start from the season. The ideia is: I find how many goals the home team scored in the past 5 matches, goals scored at home count as 1 points and goals scored away count as 2 points. I make the same for the away team and for the conceded goals. At the moment, when I filter the last 5 home team games, I only get his last 5 at home, and not the last five overall.
This is my code so far
import pandas as pd
webs = pd.read_csv('https://www.football-data.co.uk/mmz4281/2223/T1.csv')
tk = webs[['Date','HomeTeam','AwayTeam','FTHG','FTAG','FTR']]
tk.columns = ['Data','Home','Away','FT_Gols_H','FT_Gols_A','FT_Result']
tk['Goal_Points'] = tk.groupby('Home')['FT_Gols_H'].rolling(window=5, min_periods=1).sum().reset_index(0,drop=True).shift(1)
答案1
得分: 0
我建议稍微调整你的数据格式。将数据框转换为包含3列的形式:日期、球队和该日期的进球数,无论他们是主场还是客场。
例子:
home_team_games = tk[['日期', '主队', '主队进球数']].rename(columns={
'主队': '球队',
'主队进球数': '进球数',
})
away_team_games = tk[['日期', '客队', '客队进球数']].rename(columns={
'客队': '球队',
'客队进球数': '进球数',
})
tk = pd.concat([home_team_games, away_team_games])
tk = tk.sort_values('日期')
然后你会得到这样的数据框:
日期 球队 进球数
193 01/02/2023 安塔利亚斯堡 0
192 01/02/2023 乌姆拉尼耶斯堡 2
191 01/02/2023 吉雷松斯波尔 1
190 01/02/2023 哈塔伊 1
193 01/02/2023 特拉布宗斯波尔 2
.. ... ... ...
189 31/01/2023 卡拉古姆鲁克 1
107 31/10/2022 安塔利亚斯堡 2
107 31/10/2022 锡瓦斯 0
106 31/10/2022 吉雷松斯波尔 1
106 31/10/2022 布尤克塞希尔 3
我稍微修改了你的代码,使其适用于这个数据框。我更改了列名,并且必须放弃索引的两个级别而不仅仅是第一个级别。
tk['前5场进球数'] = tk.groupby('球队')['进球数'].rolling(window=5, min_periods=1).sum().reset_index(drop=True).shift(1)
英文:
I'd suggest reformatting your data a little. Make the dataframe into one with 3 columns: the date, the team, and the goals they got on that date, no matter whether they were at home or away.
Example:
home_team_games = tk[['Data', 'Home', 'FT_Gols_H']].rename(columns={
'Home': 'Team',
'FT_Gols_H': 'Goals',
})
away_team_games = tk[['Data', 'Away', 'FT_Gols_A']].rename(columns={
'Away': 'Team',
'FT_Gols_A': 'Goals',
})
tk = pd.concat([home_team_games, away_team_games])
tk = tk.sort_values('Data')
Then you have a dataframe like this:
Data Team Goals
193 01/02/2023 Antalyaspor 0
192 01/02/2023 Umraniyespor 2
191 01/02/2023 Giresunspor 1
190 01/02/2023 Hatayspor 1
193 01/02/2023 Trabzonspor 2
.. ... ... ...
189 31/01/2023 Karagumruk 1
107 31/10/2022 Antalyaspor 2
107 31/10/2022 Sivasspor 0
106 31/10/2022 Giresunspor 1
106 31/10/2022 Buyuksehyr 3
I had to modify your code a little bit to get it to work on this data frame. I changed the name of the columns, and I also had to drop both levels of the index and not just the first level.
tk['Goals_Prev_5_Games'] = tk.groupby('Team')['Goals'].rolling(window=5, min_periods=1).sum().reset_index(drop=True).shift(1)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论