英文:
Rvest Tables Returning Empty
问题
我正在尝试从以下链接中提取表格:https://www.mlbdraftleague.com/mahoning-valley/roster
library(rvest)
library(magrittr)
url <- "https://www.mlbdraftleague.com/mahoning-valley/roster"
page <- read_html(url) %>%
  html_table(fill = T)
我尝试了这样做,它返回了空的数据框(DataFrames),数据框的数量正确(5),列数也正确,但数据框是空的。非常感谢所有的帮助。
英文:
I am trying to scrape the tables from the following link: https://www.mlbdraftleague.com/mahoning-valley/roster
library(rvest)
library(magrittr)
url <- "https://www.mlbdraftleague.com/mahoning-valley/roster"
page <- read_html(url) %>% 
  html_table(fill = T)
I tried that and it returned empty Dfs with the right amount of tables (5) and the right amount of columns, but the dataframes are empty. All help is appreciated.
答案1
得分: 1
以下是您提供的代码的翻译部分:
library(tidyverse)
library(httr2)
"https://statsapi.mlb.com/api/v1/teams/545/roster?hydrate=person(rosterEntries,education,stats(type=season,season=2023,sportId=22,teamId=545))&rosterType=active&season=2023&sportId=22" %>%
  request() %>%
  req_perform() %>%
  resp_body_json(simplifyVector = TRUE) %>%
  pluck("roster") %>%
  unnest(everything(), names_sep = "_")
# A tibble: 41 × 47
   person_id person_full_name      person_link   person_first_name person_last_name person_birth_date person_current_age person_birth_city person_birth_state_place
       <int> <chr>                 <chr>         <chr>             <chr>            <chr>                          <int> <chr>             <chr>                 
 1    813834 AJ Rausch             /api/v1/peop… AJ                Rausch           2002-03-19                        21 Powell            OH                    
 2    800677 Ahmad Harajli         /api/v1/peop… Ahmad             Harajli          2001-08-31                        21 Dearborn          MI                    
 3    701144 Alex Shea             /api/v1/peop… Brian             Shea             2001-05-04                        22 Union             KY                    
 4    701475 Andreaus Lewis        /api/v1/peop… Andreaus          Lewis            2002-12-10                        20 Atlanta           GA                    
 5    701499 Andrew Lucas          /api/v1/peop… Andrew            Lucas            2000-02-04                        23 Camarillo         CA                    
 6    813836 Braeden O'Shaughnessy /api/v1/peop… Braeden           O'Shaughnessy    2000-11-19                        22 Poland            OH                    
 7    681376 Brandon Hylton        /api/v1/peop… Brandon           Hylton           2000-02-01                        23 Livingston        NJ                    
 8    695480 Brennyn Abendroth     /api/v1/peop… Brennyn           Abendroth        2003-06-07                        20 Effingham         IL                    
 9    695746 Cale Lansville        /api/v1/peop… Cale              Lansville        2003-01-06                        20 Englewood         CO                    
10    809953 Cam Liss              /api/v1/peop… Cameron           Liss             2000-04-15                        23 Spokane           WA                    
# ℹ 31 more rows
请注意,我只翻译了代码部分,没有包括问题或其他内容。如果您需要进一步的翻译或有其他问题,请告诉我。
英文:
library(tidyverse)
library(httr2)
"https://statsapi.mlb.com/api/v1/teams/545/roster?hydrate=person(rosterEntries,education,stats(type=season,season=2023,sportId=22,teamId=545))&rosterType=active&season=2023&sportId=22" %>% 
  request() %>% 
  req_perform() %>% 
  resp_body_json(simplifyVector = TRUE) %>% 
  pluck("roster") %>% 
  unnest(everything(), names_sep = "_") 
# A tibble: 41 × 47
   person_id person_full_name      person_link   person_first_name person_last_name person_birth_date person_current_age person_birth_city person_birth_state_p…¹
       <int> <chr>                 <chr>         <chr>             <chr>            <chr>                          <int> <chr>             <chr>                 
 1    813834 AJ Rausch             /api/v1/peop… AJ                Rausch           2002-03-19                        21 Powell            OH                    
 2    800677 Ahmad Harajli         /api/v1/peop… Ahmad             Harajli          2001-08-31                        21 Dearborn          MI                    
 3    701144 Alex Shea             /api/v1/peop… Brian             Shea             2001-05-04                        22 Union             KY                    
 4    701475 Andreaus Lewis        /api/v1/peop… Andreaus          Lewis            2002-12-10                        20 Atlanta           GA                    
 5    701499 Andrew Lucas          /api/v1/peop… Andrew            Lucas            2000-02-04                        23 Camarillo         CA                    
 6    813836 Braeden O'Shaughnessy /api/v1/peop… Braeden           O'Shaughnessy    2000-11-19                        22 Poland            OH                    
 7    681376 Brandon Hylton        /api/v1/peop… Brandon           Hylton           2000-02-01                        23 Livingston        NJ                    
 8    695480 Brennyn Abendroth     /api/v1/peop… Brennyn           Abendroth        2003-06-07                        20 Effingham         IL                    
 9    695746 Cale Lansville        /api/v1/peop… Cale              Lansville        2003-01-06                        20 Englewood         CO                    
10    809953 Cam Liss              /api/v1/peop… Cameron           Liss             2000-04-15                        23 Spokane           WA                    
# ℹ 31 more rows
答案2
得分: 0
以下是代码的翻译部分:
其中一种选项是从嵌套列表中逐个提取相关列,尽管只有很少的列时更有意义,就像教练们一样:
library(dplyr)
library(jsonlite)
library(purrr)
url_roster  <- "https://statsapi.mlb.com/api/v1/teams/545/roster?hydrate=person(rosterEntries,education,stats(type=season,season=2023,sportId=22,teamId=545))&rosterType=active&season=2023&sportId=22"
url_coaches <- "https://statsapi.mlb.com/api/v1/teams/545/coaches?hydrate=person&rosterType=active&sportId=22&season=2023"
fromJSON(url_roster, simplifyVector = FALSE)[["roster"]] %>%
  map(~ list(pos    = pluck(.x, "person", "primaryPosition", "type"),
             name   = pluck(.x, "person", "fullName"),
             j_nr   = pluck(.x, "jerseyNumber"),
             wins   = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "wins"),
             losses = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "losses"),
             era    = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "era"),
             b_side = pluck(.x, "person", "batSide", "code"),
             p_hand = pluck(.x, "person", "pitchHand", "code"),
             height = pluck(.x, "person", "height"),
             weight = pluck(.x, "person", "weight"),
             dob    = pluck(.x, "person", "birthDate"),
             school = pluck(.x, "person", "education", "colleges", 1, "name")
             )) %>%
  bind_rows()
fromJSON(url_coaches, simplifyVector = FALSE)[["roster"]] %>%
  map(~ list(title  = pluck(.x, "title"),
             name   = pluck(.x, "person", "fullName"))) %>%
  bind_rows()
希望这对您有所帮助。如果您有其他问题,请随时提问。
英文:
One of the options is plucking relevant columns one by one from a nested list, though it makes more sense when there are just a few / fewer of those, like with coaches :
library(dplyr)
library(jsonlite)
library(purrr)
url_roster  <- "https://statsapi.mlb.com/api/v1/teams/545/roster?hydrate=person(rosterEntries,education,stats(type=season,season=2023,sportId=22,teamId=545))&rosterType=active&season=2023&sportId=22"
url_coaches <- "https://statsapi.mlb.com/api/v1/teams/545/coaches?hydrate=person&rosterType=active&sportId=22&season=2023"
fromJSON(url_roster, simplifyVector = FALSE)[["roster"]] %>% 
  map(~ list(pos    = pluck(.x, "person", "primaryPosition", "type"),
             name   = pluck(.x, "person", "fullName"),
             j_nr   = pluck(.x, "jerseyNumber"),
             wins   = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "wins"),
             losses = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "losses"),
             era    = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "era"),
             b_side = pluck(.x, "person", "batSide", "code"),
             p_hand = pluck(.x, "person", "pitchHand", "code"),
             height = pluck(.x, "person", "height"),
             weight = pluck(.x, "person", "weight"),
             dob    = pluck(.x, "person", "birthDate"),
             school = pluck(.x, "person", "education", "colleges", 1, "name")
             )) %>% 
  bind_rows()
#> # A tibble: 41 × 12
#>    pos   name  j_nr  b_side p_hand height weight dob   school  wins losses era  
#>    <chr> <chr> <chr> <chr>  <chr>  <chr>   <int> <chr> <chr>  <int>  <int> <chr>
#>  1 Outf… AJ R… "17"  R      R      "5' 1…    195 2002… Ohio      NA     NA <NA> 
#>  2 Pitc… Ahma… "41"  R      R      "6' 4…    240 2001… Michi…     1      1 6.14 
#>  3 Pitc… Alex… "48"  L      L      "6' 4…    211 2001… Cinci…     0      0 12.60
#>  4 Catc… Andr… "9"   L      R      "5' 1…    195 2002… Pensa…    NA     NA <NA> 
#>  5 Pitc… Andr… "33"  R      R      "5' 1…    190 2000… Texas…     1      1 1.80 
#>  6 Infi… Brae… "4"   R      R      "6' 3…    200 2000… Young…    NA     NA <NA> 
#>  7 Outf… Bran… "32"  L      R      "6' 8…    255 2000… Stets…    NA     NA <NA> 
#>  8 Pitc… Bren… "52"  R      R      "6' 4…    195 2003… South…     0      0 6.00 
#>  9 Pitc… Cale… ""    R      R      "6' 1…    205 2003… San J…    NA     NA <NA> 
#> 10 Pitc… Cam … "46"  L      L      "6' 0…    202 2000… Washi…     0      0 0.00 
#> # ℹ 31 more rows
fromJSON(url_coaches, simplifyVector = FALSE)[["roster"]] %>% 
  map(~ list(title  = pluck(.x, "title"),
             name   = pluck(.x, "person", "fullName"))) %>% 
  bind_rows()
#> # A tibble: 4 × 2
#>   title           name         
#>   <chr>           <chr>        
#> 1 Manager         Dmitri Young 
#> 2 Hitting Coach   Bryant Nelson
#> 3 Pitching Coach  Ray King     
#> 4 Assistant Coach Craig Antush
<sup>Created on 2023-06-13 with reprex v2.0.2</sup>
You might also want to check baseballr package.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论