英文:
Unable to scrape the second table from web page using R
问题
我正在尝试在R中爬取这个网页上的第二个表格“Player Standard Stats”:https://fbref.com/en/comps/9/stats/Premier-League-Stats#stats_standard
我正在使用以下代码:
url <- "https://fbref.com/en/comps/9/stats/Premier-League-Stats#stats_standard"
xG_ind <- url %>%
xml2::read_html() %>%
rvest::html_nodes('table') %>%
html_table() %>%
.[[1]]
这只能让我爬取页面上的第一个表格,“Squad Standard Stats”。请问如何获取第二个表格的方法?
英文:
I am trying to scrape the second table "Player Standard Stats" on this web page in R: "https://fbref.com/en/comps/9/stats/Premier-League-Stats#stats_standard"
I am using the following code:
url <- "https://fbref.com/en/comps/9/stats/Premier-League-Stats#stats_standard"
xG_ind <- url %>%
xml2::read_html() %>%
rvest::html_nodes('table') %>%
html_table() %>%
.[[1]]
This only will let me scrape the first table on the page, "Squad Standard Stats". Please can you provide advice on how to get the second table?
答案1
得分: 2
以下是代码的翻译部分:
library(rvest)
library(httr)
url <- "https://fbref.com/en/comps/9/stats/Premier-League-Stats#stats_standard"
html_resp <- GET(url)
html <- content(html_resp, as = "text") %>%
stringr::str_remove_all("<!--|-->") %>%
read_html()
html %>%
html_element("table#stats_standard") %>%
html_table()
#> # A tibble: 508 × 33
#> `` `` `` `` `` `` `` Playi…¹ Playi…² Playi…³ Playi…⁴
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Rk Player Nati… Pos Squad Age Born MP Starts Min 90s
#> 2 1 Brenden … us U… MF,FW Leed… 22-0… 2000 17 17 1,423 15.8
#> 3 2 Che Adams sct … FW Sout… 26-1… 1996 17 15 1,336 14.8
#> 4 3 Tyler Ad… us U… MF Leed… 23-3… 1999 15 15 1,346 15.0
#> 5 4 Tosin Ad… eng … DF Fulh… 25-1… 1997 12 11 991 11.0
#> 6 5 Nayef Ag… ma M… DF West… 26-2… 1996 2 1 166 1.8
#> 7 6 Rayan Aï… fr F… DF Wolv… 21-2… 2001 13 7 749 8.3
#> 8 7 Kristoff… no N… DF Bren… 24-2… 1998 6 6 502 5.6
#> 9 8 Manuel A… ch S… DF Manc… 27-1… 1995 11 10 926 10.3
#> 10 9 Nathan A… nl N… DF Manc… 27-3… 1995 11 10 841 9.3
#> # … with 498 more rows, 22 more variables: Performance <chr>,
#> # Performance <chr>, Performance <chr>, Performance <chr>, Performance <chr>,
#> # Performance <chr>, Performance <chr>, `Per 90 Minutes` <chr>,
#> # `Per 90 Minutes` <chr>, `Per 90 Minutes` <chr>, `Per 90 Minutes` <chr>,
#> # `Per 90 Minutes` <chr>, Expected <chr>, Expected <chr>, Expected <chr>,
#> # Expected <chr>, `Per 90 Minutes` <chr>, `Per 90 Minutes` <chr>,
#> # `Per 90 Minutes` <chr>, `Per 90 Minutes` <chr>, `Per 90 Minutes` <chr>, …
创建于2023-01-09,使用reprex v2.0.2
英文:
The Player Standard Stats table is delivered as commented out HTML block so it will be ignored by rvest. Probably the simplest (or just plain lazy) approach would be to blindly remove all HTML comment tags from source HTML string and parse the result. In this particular case it seems to work:
library(rvest)
library(httr)
url <- "https://fbref.com/en/comps/9/stats/Premier-League-Stats#stats_standard"
html_resp <- GET(url)
html <- content(html_resp, as = "text") %>%
stringr::str_remove_all("(<!--|-->)") %>%
read_html()
html %>%
html_element("table#stats_standard") %>%
html_table()
#> # A tibble: 508 × 33
#> `` `` `` `` `` `` `` Playi…¹ Playi…² Playi…³ Playi…⁴
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Rk Player Nati… Pos Squad Age Born MP Starts Min 90s
#> 2 1 Brenden … us U… MF,FW Leed… 22-0… 2000 17 17 1,423 15.8
#> 3 2 Che Adams sct … FW Sout… 26-1… 1996 17 15 1,336 14.8
#> 4 3 Tyler Ad… us U… MF Leed… 23-3… 1999 15 15 1,346 15.0
#> 5 4 Tosin Ad… eng … DF Fulh… 25-1… 1997 12 11 991 11.0
#> 6 5 Nayef Ag… ma M… DF West… 26-2… 1996 2 1 166 1.8
#> 7 6 Rayan Aï… fr F… DF Wolv… 21-2… 2001 13 7 749 8.3
#> 8 7 Kristoff… no N… DF Bren… 24-2… 1998 6 6 502 5.6
#> 9 8 Manuel A… ch S… DF Manc… 27-1… 1995 11 10 926 10.3
#> 10 9 Nathan A… nl N… DF Manc… 27-3… 1995 11 10 841 9.3
#> # … with 498 more rows, 22 more variables: Performance <chr>,
#> # Performance <chr>, Performance <chr>, Performance <chr>, Performance <chr>,
#> # Performance <chr>, Performance <chr>, `Per 90 Minutes` <chr>,
#> # `Per 90 Minutes` <chr>, `Per 90 Minutes` <chr>, `Per 90 Minutes` <chr>,
#> # `Per 90 Minutes` <chr>, Expected <chr>, Expected <chr>, Expected <chr>,
#> # Expected <chr>, `Per 90 Minutes` <chr>, `Per 90 Minutes` <chr>,
#> # `Per 90 Minutes` <chr>, `Per 90 Minutes` <chr>, `Per 90 Minutes` <chr>, …
<sup>Created on 2023-01-09 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论