无法使用R从网页中抓取第二个表格。

huangapple go评论71阅读模式
英文:

Unable to scrape the second table from web page using R

问题

我正在尝试在R中爬取这个网页上的第二个表格“Player Standard Stats”:https://fbref.com/en/comps/9/stats/Premier-League-Stats#stats_standard

我正在使用以下代码:

url <- "https://fbref.com/en/comps/9/stats/Premier-League-Stats#stats_standard"

xG_ind <- url %>%
  xml2::read_html() %>%
  rvest::html_nodes('table') %>%
  html_table() %>%
  .[[1]]

这只能让我爬取页面上的第一个表格,“Squad Standard Stats”。请问如何获取第二个表格的方法?

英文:

I am trying to scrape the second table "Player Standard Stats" on this web page in R: "https://fbref.com/en/comps/9/stats/Premier-League-Stats#stats_standard"

I am using the following code:

url &lt;- &quot;https://fbref.com/en/comps/9/stats/Premier-League-Stats#stats_standard&quot;

xG_ind &lt;- url %&gt;% 
  xml2::read_html() %&gt;%
  rvest::html_nodes(&#39;table&#39;) %&gt;%
  html_table() %&gt;%
  .[[1]]

This only will let me scrape the first table on the page, "Squad Standard Stats". Please can you provide advice on how to get the second table?

答案1

得分: 2

以下是代码的翻译部分:

library(rvest)
library(httr)

url <- "https://fbref.com/en/comps/9/stats/Premier-League-Stats#stats_standard"

html_resp <- GET(url)
html <- content(html_resp, as = "text") %>%
  stringr::str_remove_all("<!--|-->") %>%
  read_html()

html %>%
  html_element("table#stats_standard") %>%
  html_table()
#> # A tibble: 508 × 33
#>    ``    ``        ``    ``    ``    ``    ``    Playi…¹ Playi…² Playi…³ Playi…⁴
#>    <chr> <chr>     <chr> <chr> <chr> <chr> <chr> <chr>   <chr>   <chr>   <chr>  
#>  1 Rk    Player    Nati… Pos   Squad Age   Born  MP      Starts  Min     90s    
#>  2 1     Brenden … us U… MF,FW Leed… 22-0… 2000  17      17      1,423   15.8   
#>  3 2     Che Adams sct … FW    Sout… 26-1… 1996  17      15      1,336   14.8   
#>  4 3     Tyler Ad… us U… MF    Leed… 23-3… 1999  15      15      1,346   15.0   
#>  5 4     Tosin Ad… eng … DF    Fulh… 25-1… 1997  12      11      991     11.0   
#>  6 5     Nayef Ag… ma M… DF    West… 26-2… 1996  2       1       166     1.8    
#>  7 6     Rayan Aï… fr F… DF    Wolv… 21-2… 2001  13      7       749     8.3    
#>  8 7     Kristoff… no N… DF    Bren… 24-2… 1998  6       6       502     5.6    
#>  9 8     Manuel A… ch S… DF    Manc… 27-1… 1995  11      10      926     10.3   
#> 10 9     Nathan A… nl N… DF    Manc… 27-3… 1995  11      10      841     9.3    
#> # … with 498 more rows, 22 more variables: Performance <chr>,
#> #   Performance <chr>, Performance <chr>, Performance <chr>, Performance <chr>,
#> #   Performance <chr>, Performance <chr>, `Per 90 Minutes` <chr>,
#> #   `Per 90 Minutes` <chr>, `Per 90 Minutes` <chr>, `Per 90 Minutes` <chr>,
#> #   `Per 90 Minutes` <chr>, Expected <chr>, Expected <chr>, Expected <chr>,
#> #   Expected <chr>, `Per 90 Minutes` <chr>, `Per 90 Minutes` <chr>,
#> #   `Per 90 Minutes` <chr>, `Per 90 Minutes` <chr>, `Per 90 Minutes` <chr>, …

创建于2023-01-09,使用reprex v2.0.2

英文:

The Player Standard Stats table is delivered as commented out HTML block so it will be ignored by rvest. Probably the simplest (or just plain lazy) approach would be to blindly remove all HTML comment tags from source HTML string and parse the result. In this particular case it seems to work:

library(rvest)
library(httr)

url &lt;- &quot;https://fbref.com/en/comps/9/stats/Premier-League-Stats#stats_standard&quot;

html_resp &lt;- GET(url)
html &lt;- content(html_resp, as = &quot;text&quot;) %&gt;% 
  stringr::str_remove_all(&quot;(&lt;!--|--&gt;)&quot;) %&gt;% 
  read_html()

html %&gt;% 
  html_element(&quot;table#stats_standard&quot;) %&gt;% 
  html_table()
#&gt; # A tibble: 508 &#215; 33
#&gt;    ``    ``        ``    ``    ``    ``    ``    Playi…&#185; Playi…&#178; Playi…&#179; Playi…⁴
#&gt;    &lt;chr&gt; &lt;chr&gt;     &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;   &lt;chr&gt;   &lt;chr&gt;   &lt;chr&gt;  
#&gt;  1 Rk    Player    Nati… Pos   Squad Age   Born  MP      Starts  Min     90s    
#&gt;  2 1     Brenden … us U… MF,FW Leed… 22-0… 2000  17      17      1,423   15.8   
#&gt;  3 2     Che Adams sct … FW    Sout… 26-1… 1996  17      15      1,336   14.8   
#&gt;  4 3     Tyler Ad… us U… MF    Leed… 23-3… 1999  15      15      1,346   15.0   
#&gt;  5 4     Tosin Ad… eng … DF    Fulh… 25-1… 1997  12      11      991     11.0   
#&gt;  6 5     Nayef Ag… ma M… DF    West… 26-2… 1996  2       1       166     1.8    
#&gt;  7 6     Rayan A&#239;… fr F… DF    Wolv… 21-2… 2001  13      7       749     8.3    
#&gt;  8 7     Kristoff… no N… DF    Bren… 24-2… 1998  6       6       502     5.6    
#&gt;  9 8     Manuel A… ch S… DF    Manc… 27-1… 1995  11      10      926     10.3   
#&gt; 10 9     Nathan A… nl N… DF    Manc… 27-3… 1995  11      10      841     9.3    
#&gt; # … with 498 more rows, 22 more variables: Performance &lt;chr&gt;,
#&gt; #   Performance &lt;chr&gt;, Performance &lt;chr&gt;, Performance &lt;chr&gt;, Performance &lt;chr&gt;,
#&gt; #   Performance &lt;chr&gt;, Performance &lt;chr&gt;, `Per 90 Minutes` &lt;chr&gt;,
#&gt; #   `Per 90 Minutes` &lt;chr&gt;, `Per 90 Minutes` &lt;chr&gt;, `Per 90 Minutes` &lt;chr&gt;,
#&gt; #   `Per 90 Minutes` &lt;chr&gt;, Expected &lt;chr&gt;, Expected &lt;chr&gt;, Expected &lt;chr&gt;,
#&gt; #   Expected &lt;chr&gt;, `Per 90 Minutes` &lt;chr&gt;, `Per 90 Minutes` &lt;chr&gt;,
#&gt; #   `Per 90 Minutes` &lt;chr&gt;, `Per 90 Minutes` &lt;chr&gt;, `Per 90 Minutes` &lt;chr&gt;, …

<sup>Created on 2023-01-09 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年1月9日 17:55:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/75055576.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定