Rvest表格返回空白

huangapple go评论66阅读模式
英文:

Rvest Tables Returning Empty

问题

我正在尝试从以下链接中提取表格:https://www.mlbdraftleague.com/mahoning-valley/roster

library(rvest)
library(magrittr)

url <- "https://www.mlbdraftleague.com/mahoning-valley/roster"
page <- read_html(url) %>%
  html_table(fill = T)

我尝试了这样做,它返回了空的数据框(DataFrames),数据框的数量正确(5),列数也正确,但数据框是空的。非常感谢所有的帮助。

英文:

I am trying to scrape the tables from the following link: https://www.mlbdraftleague.com/mahoning-valley/roster

library(rvest)
library(magrittr)

url &lt;- &quot;https://www.mlbdraftleague.com/mahoning-valley/roster&quot;
page &lt;- read_html(url) %&gt;% 
  html_table(fill = T)

I tried that and it returned empty Dfs with the right amount of tables (5) and the right amount of columns, but the dataframes are empty. All help is appreciated.

答案1

得分: 1

以下是您提供的代码的翻译部分:

library(tidyverse)
library(httr2)

"https://statsapi.mlb.com/api/v1/teams/545/roster?hydrate=person(rosterEntries,education,stats(type=season,season=2023,sportId=22,teamId=545))&rosterType=active&season=2023&sportId=22" %>%
  request() %>%
  req_perform() %>%
  resp_body_json(simplifyVector = TRUE) %>%
  pluck("roster") %>%
  unnest(everything(), names_sep = "_")

# A tibble: 41 × 47
   person_id person_full_name      person_link   person_first_name person_last_name person_birth_date person_current_age person_birth_city person_birth_state_place
       <int> <chr>                 <chr>         <chr>             <chr>            <chr>                          <int> <chr>             <chr>                 
 1    813834 AJ Rausch             /api/v1/peop… AJ                Rausch           2002-03-19                        21 Powell            OH                    
 2    800677 Ahmad Harajli         /api/v1/peop… Ahmad             Harajli          2001-08-31                        21 Dearborn          MI                    
 3    701144 Alex Shea             /api/v1/peop… Brian             Shea             2001-05-04                        22 Union             KY                    
 4    701475 Andreaus Lewis        /api/v1/peop… Andreaus          Lewis            2002-12-10                        20 Atlanta           GA                    
 5    701499 Andrew Lucas          /api/v1/peop… Andrew            Lucas            2000-02-04                        23 Camarillo         CA                    
 6    813836 Braeden O'Shaughnessy /api/v1/peop… Braeden           O'Shaughnessy    2000-11-19                        22 Poland            OH                    
 7    681376 Brandon Hylton        /api/v1/peop… Brandon           Hylton           2000-02-01                        23 Livingston        NJ                    
 8    695480 Brennyn Abendroth     /api/v1/peop… Brennyn           Abendroth        2003-06-07                        20 Effingham         IL                    
 9    695746 Cale Lansville        /api/v1/peop… Cale              Lansville        2003-01-06                        20 Englewood         CO                    
10    809953 Cam Liss              /api/v1/peop… Cameron           Liss             2000-04-15                        23 Spokane           WA                    
# ℹ 31 more rows

请注意,我只翻译了代码部分,没有包括问题或其他内容。如果您需要进一步的翻译或有其他问题,请告诉我。

英文:
library(tidyverse)
library(httr2)

&quot;https://statsapi.mlb.com/api/v1/teams/545/roster?hydrate=person(rosterEntries,education,stats(type=season,season=2023,sportId=22,teamId=545))&amp;rosterType=active&amp;season=2023&amp;sportId=22&quot; %&gt;% 
  request() %&gt;% 
  req_perform() %&gt;% 
  resp_body_json(simplifyVector = TRUE) %&gt;% 
  pluck(&quot;roster&quot;) %&gt;% 
  unnest(everything(), names_sep = &quot;_&quot;) 

# A tibble: 41 &#215; 47
   person_id person_full_name      person_link   person_first_name person_last_name person_birth_date person_current_age person_birth_city person_birth_state_p…&#185;
       &lt;int&gt; &lt;chr&gt;                 &lt;chr&gt;         &lt;chr&gt;             &lt;chr&gt;            &lt;chr&gt;                          &lt;int&gt; &lt;chr&gt;             &lt;chr&gt;                 
 1    813834 AJ Rausch             /api/v1/peop… AJ                Rausch           2002-03-19                        21 Powell            OH                    
 2    800677 Ahmad Harajli         /api/v1/peop… Ahmad             Harajli          2001-08-31                        21 Dearborn          MI                    
 3    701144 Alex Shea             /api/v1/peop… Brian             Shea             2001-05-04                        22 Union             KY                    
 4    701475 Andreaus Lewis        /api/v1/peop… Andreaus          Lewis            2002-12-10                        20 Atlanta           GA                    
 5    701499 Andrew Lucas          /api/v1/peop… Andrew            Lucas            2000-02-04                        23 Camarillo         CA                    
 6    813836 Braeden O&#39;Shaughnessy /api/v1/peop… Braeden           O&#39;Shaughnessy    2000-11-19                        22 Poland            OH                    
 7    681376 Brandon Hylton        /api/v1/peop… Brandon           Hylton           2000-02-01                        23 Livingston        NJ                    
 8    695480 Brennyn Abendroth     /api/v1/peop… Brennyn           Abendroth        2003-06-07                        20 Effingham         IL                    
 9    695746 Cale Lansville        /api/v1/peop… Cale              Lansville        2003-01-06                        20 Englewood         CO                    
10    809953 Cam Liss              /api/v1/peop… Cameron           Liss             2000-04-15                        23 Spokane           WA                    
# ℹ 31 more rows

答案2

得分: 0

以下是代码的翻译部分:

其中一种选项是从嵌套列表中逐个提取相关列,尽管只有很少的列时更有意义,就像教练们一样:

library(dplyr)
library(jsonlite)
library(purrr)
url_roster  <- "https://statsapi.mlb.com/api/v1/teams/545/roster?hydrate=person(rosterEntries,education,stats(type=season,season=2023,sportId=22,teamId=545))&rosterType=active&season=2023&sportId=22"
url_coaches <- "https://statsapi.mlb.com/api/v1/teams/545/coaches?hydrate=person&rosterType=active&sportId=22&season=2023"

fromJSON(url_roster, simplifyVector = FALSE)[["roster"]] %>%
  map(~ list(pos    = pluck(.x, "person", "primaryPosition", "type"),
             name   = pluck(.x, "person", "fullName"),
             j_nr   = pluck(.x, "jerseyNumber"),
             wins   = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "wins"),
             losses = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "losses"),
             era    = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "era"),
             b_side = pluck(.x, "person", "batSide", "code"),
             p_hand = pluck(.x, "person", "pitchHand", "code"),
             height = pluck(.x, "person", "height"),
             weight = pluck(.x, "person", "weight"),
             dob    = pluck(.x, "person", "birthDate"),
             school = pluck(.x, "person", "education", "colleges", 1, "name")
             )) %>%
  bind_rows()

fromJSON(url_coaches, simplifyVector = FALSE)[["roster"]] %>%
  map(~ list(title  = pluck(.x, "title"),
             name   = pluck(.x, "person", "fullName"))) %>%
  bind_rows()

希望这对您有所帮助。如果您有其他问题,请随时提问。

英文:

One of the options is plucking relevant columns one by one from a nested list, though it makes more sense when there are just a few / fewer of those, like with coaches :

library(dplyr)
library(jsonlite)
library(purrr)
url_roster  &lt;- &quot;https://statsapi.mlb.com/api/v1/teams/545/roster?hydrate=person(rosterEntries,education,stats(type=season,season=2023,sportId=22,teamId=545))&amp;rosterType=active&amp;season=2023&amp;sportId=22&quot;
url_coaches &lt;- &quot;https://statsapi.mlb.com/api/v1/teams/545/coaches?hydrate=person&amp;rosterType=active&amp;sportId=22&amp;season=2023&quot;

fromJSON(url_roster, simplifyVector = FALSE)[[&quot;roster&quot;]] %&gt;% 
  map(~ list(pos    = pluck(.x, &quot;person&quot;, &quot;primaryPosition&quot;, &quot;type&quot;),
             name   = pluck(.x, &quot;person&quot;, &quot;fullName&quot;),
             j_nr   = pluck(.x, &quot;jerseyNumber&quot;),
             wins   = pluck(.x, &quot;person&quot;, &quot;stats&quot;, 1, &quot;splits&quot;, 1, &quot;stat&quot;, &quot;wins&quot;),
             losses = pluck(.x, &quot;person&quot;, &quot;stats&quot;, 1, &quot;splits&quot;, 1, &quot;stat&quot;, &quot;losses&quot;),
             era    = pluck(.x, &quot;person&quot;, &quot;stats&quot;, 1, &quot;splits&quot;, 1, &quot;stat&quot;, &quot;era&quot;),
             b_side = pluck(.x, &quot;person&quot;, &quot;batSide&quot;, &quot;code&quot;),
             p_hand = pluck(.x, &quot;person&quot;, &quot;pitchHand&quot;, &quot;code&quot;),
             height = pluck(.x, &quot;person&quot;, &quot;height&quot;),
             weight = pluck(.x, &quot;person&quot;, &quot;weight&quot;),
             dob    = pluck(.x, &quot;person&quot;, &quot;birthDate&quot;),
             school = pluck(.x, &quot;person&quot;, &quot;education&quot;, &quot;colleges&quot;, 1, &quot;name&quot;)
             )) %&gt;% 
  bind_rows()
#&gt; # A tibble: 41 &#215; 12
#&gt;    pos   name  j_nr  b_side p_hand height weight dob   school  wins losses era  
#&gt;    &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;  &lt;chr&gt;  &lt;chr&gt;   &lt;int&gt; &lt;chr&gt; &lt;chr&gt;  &lt;int&gt;  &lt;int&gt; &lt;chr&gt;
#&gt;  1 Outf… AJ R… &quot;17&quot;  R      R      &quot;5&#39; 1…    195 2002… Ohio      NA     NA &lt;NA&gt; 
#&gt;  2 Pitc… Ahma… &quot;41&quot;  R      R      &quot;6&#39; 4…    240 2001… Michi…     1      1 6.14 
#&gt;  3 Pitc… Alex… &quot;48&quot;  L      L      &quot;6&#39; 4…    211 2001… Cinci…     0      0 12.60
#&gt;  4 Catc… Andr… &quot;9&quot;   L      R      &quot;5&#39; 1…    195 2002… Pensa…    NA     NA &lt;NA&gt; 
#&gt;  5 Pitc… Andr… &quot;33&quot;  R      R      &quot;5&#39; 1…    190 2000… Texas…     1      1 1.80 
#&gt;  6 Infi… Brae… &quot;4&quot;   R      R      &quot;6&#39; 3…    200 2000… Young…    NA     NA &lt;NA&gt; 
#&gt;  7 Outf… Bran… &quot;32&quot;  L      R      &quot;6&#39; 8…    255 2000… Stets…    NA     NA &lt;NA&gt; 
#&gt;  8 Pitc… Bren… &quot;52&quot;  R      R      &quot;6&#39; 4…    195 2003… South…     0      0 6.00 
#&gt;  9 Pitc… Cale… &quot;&quot;    R      R      &quot;6&#39; 1…    205 2003… San J…    NA     NA &lt;NA&gt; 
#&gt; 10 Pitc… Cam … &quot;46&quot;  L      L      &quot;6&#39; 0…    202 2000… Washi…     0      0 0.00 
#&gt; # ℹ 31 more rows

fromJSON(url_coaches, simplifyVector = FALSE)[[&quot;roster&quot;]] %&gt;% 
  map(~ list(title  = pluck(.x, &quot;title&quot;),
             name   = pluck(.x, &quot;person&quot;, &quot;fullName&quot;))) %&gt;% 
  bind_rows()
#&gt; # A tibble: 4 &#215; 2
#&gt;   title           name         
#&gt;   &lt;chr&gt;           &lt;chr&gt;        
#&gt; 1 Manager         Dmitri Young 
#&gt; 2 Hitting Coach   Bryant Nelson
#&gt; 3 Pitching Coach  Ray King     
#&gt; 4 Assistant Coach Craig Antush

<sup>Created on 2023-06-13 with reprex v2.0.2</sup>

You might also want to check baseballr package.

huangapple
  • 本文由 发表于 2023年6月13日 14:04:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76462071.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定