英文:
Rvest Tables Returning Empty
问题
我正在尝试从以下链接中提取表格:https://www.mlbdraftleague.com/mahoning-valley/roster
library(rvest)
library(magrittr)
url <- "https://www.mlbdraftleague.com/mahoning-valley/roster"
page <- read_html(url) %>%
html_table(fill = T)
我尝试了这样做,它返回了空的数据框(DataFrames),数据框的数量正确(5),列数也正确,但数据框是空的。非常感谢所有的帮助。
英文:
I am trying to scrape the tables from the following link: https://www.mlbdraftleague.com/mahoning-valley/roster
library(rvest)
library(magrittr)
url <- "https://www.mlbdraftleague.com/mahoning-valley/roster"
page <- read_html(url) %>%
html_table(fill = T)
I tried that and it returned empty Dfs with the right amount of tables (5) and the right amount of columns, but the dataframes are empty. All help is appreciated.
答案1
得分: 1
以下是您提供的代码的翻译部分:
library(tidyverse)
library(httr2)
"https://statsapi.mlb.com/api/v1/teams/545/roster?hydrate=person(rosterEntries,education,stats(type=season,season=2023,sportId=22,teamId=545))&rosterType=active&season=2023&sportId=22" %>%
request() %>%
req_perform() %>%
resp_body_json(simplifyVector = TRUE) %>%
pluck("roster") %>%
unnest(everything(), names_sep = "_")
# A tibble: 41 × 47
person_id person_full_name person_link person_first_name person_last_name person_birth_date person_current_age person_birth_city person_birth_state_place
<int> <chr> <chr> <chr> <chr> <chr> <int> <chr> <chr>
1 813834 AJ Rausch /api/v1/peop… AJ Rausch 2002-03-19 21 Powell OH
2 800677 Ahmad Harajli /api/v1/peop… Ahmad Harajli 2001-08-31 21 Dearborn MI
3 701144 Alex Shea /api/v1/peop… Brian Shea 2001-05-04 22 Union KY
4 701475 Andreaus Lewis /api/v1/peop… Andreaus Lewis 2002-12-10 20 Atlanta GA
5 701499 Andrew Lucas /api/v1/peop… Andrew Lucas 2000-02-04 23 Camarillo CA
6 813836 Braeden O'Shaughnessy /api/v1/peop… Braeden O'Shaughnessy 2000-11-19 22 Poland OH
7 681376 Brandon Hylton /api/v1/peop… Brandon Hylton 2000-02-01 23 Livingston NJ
8 695480 Brennyn Abendroth /api/v1/peop… Brennyn Abendroth 2003-06-07 20 Effingham IL
9 695746 Cale Lansville /api/v1/peop… Cale Lansville 2003-01-06 20 Englewood CO
10 809953 Cam Liss /api/v1/peop… Cameron Liss 2000-04-15 23 Spokane WA
# ℹ 31 more rows
请注意,我只翻译了代码部分,没有包括问题或其他内容。如果您需要进一步的翻译或有其他问题,请告诉我。
英文:
library(tidyverse)
library(httr2)
"https://statsapi.mlb.com/api/v1/teams/545/roster?hydrate=person(rosterEntries,education,stats(type=season,season=2023,sportId=22,teamId=545))&rosterType=active&season=2023&sportId=22" %>%
request() %>%
req_perform() %>%
resp_body_json(simplifyVector = TRUE) %>%
pluck("roster") %>%
unnest(everything(), names_sep = "_")
# A tibble: 41 × 47
person_id person_full_name person_link person_first_name person_last_name person_birth_date person_current_age person_birth_city person_birth_state_p…¹
<int> <chr> <chr> <chr> <chr> <chr> <int> <chr> <chr>
1 813834 AJ Rausch /api/v1/peop… AJ Rausch 2002-03-19 21 Powell OH
2 800677 Ahmad Harajli /api/v1/peop… Ahmad Harajli 2001-08-31 21 Dearborn MI
3 701144 Alex Shea /api/v1/peop… Brian Shea 2001-05-04 22 Union KY
4 701475 Andreaus Lewis /api/v1/peop… Andreaus Lewis 2002-12-10 20 Atlanta GA
5 701499 Andrew Lucas /api/v1/peop… Andrew Lucas 2000-02-04 23 Camarillo CA
6 813836 Braeden O'Shaughnessy /api/v1/peop… Braeden O'Shaughnessy 2000-11-19 22 Poland OH
7 681376 Brandon Hylton /api/v1/peop… Brandon Hylton 2000-02-01 23 Livingston NJ
8 695480 Brennyn Abendroth /api/v1/peop… Brennyn Abendroth 2003-06-07 20 Effingham IL
9 695746 Cale Lansville /api/v1/peop… Cale Lansville 2003-01-06 20 Englewood CO
10 809953 Cam Liss /api/v1/peop… Cameron Liss 2000-04-15 23 Spokane WA
# ℹ 31 more rows
答案2
得分: 0
以下是代码的翻译部分:
其中一种选项是从嵌套列表中逐个提取相关列,尽管只有很少的列时更有意义,就像教练们一样:
library(dplyr)
library(jsonlite)
library(purrr)
url_roster <- "https://statsapi.mlb.com/api/v1/teams/545/roster?hydrate=person(rosterEntries,education,stats(type=season,season=2023,sportId=22,teamId=545))&rosterType=active&season=2023&sportId=22"
url_coaches <- "https://statsapi.mlb.com/api/v1/teams/545/coaches?hydrate=person&rosterType=active&sportId=22&season=2023"
fromJSON(url_roster, simplifyVector = FALSE)[["roster"]] %>%
map(~ list(pos = pluck(.x, "person", "primaryPosition", "type"),
name = pluck(.x, "person", "fullName"),
j_nr = pluck(.x, "jerseyNumber"),
wins = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "wins"),
losses = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "losses"),
era = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "era"),
b_side = pluck(.x, "person", "batSide", "code"),
p_hand = pluck(.x, "person", "pitchHand", "code"),
height = pluck(.x, "person", "height"),
weight = pluck(.x, "person", "weight"),
dob = pluck(.x, "person", "birthDate"),
school = pluck(.x, "person", "education", "colleges", 1, "name")
)) %>%
bind_rows()
fromJSON(url_coaches, simplifyVector = FALSE)[["roster"]] %>%
map(~ list(title = pluck(.x, "title"),
name = pluck(.x, "person", "fullName"))) %>%
bind_rows()
希望这对您有所帮助。如果您有其他问题,请随时提问。
英文:
One of the options is pluck
ing relevant columns one by one from a nested list, though it makes more sense when there are just a few / fewer of those, like with coaches :
library(dplyr)
library(jsonlite)
library(purrr)
url_roster <- "https://statsapi.mlb.com/api/v1/teams/545/roster?hydrate=person(rosterEntries,education,stats(type=season,season=2023,sportId=22,teamId=545))&rosterType=active&season=2023&sportId=22"
url_coaches <- "https://statsapi.mlb.com/api/v1/teams/545/coaches?hydrate=person&rosterType=active&sportId=22&season=2023"
fromJSON(url_roster, simplifyVector = FALSE)[["roster"]] %>%
map(~ list(pos = pluck(.x, "person", "primaryPosition", "type"),
name = pluck(.x, "person", "fullName"),
j_nr = pluck(.x, "jerseyNumber"),
wins = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "wins"),
losses = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "losses"),
era = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "era"),
b_side = pluck(.x, "person", "batSide", "code"),
p_hand = pluck(.x, "person", "pitchHand", "code"),
height = pluck(.x, "person", "height"),
weight = pluck(.x, "person", "weight"),
dob = pluck(.x, "person", "birthDate"),
school = pluck(.x, "person", "education", "colleges", 1, "name")
)) %>%
bind_rows()
#> # A tibble: 41 × 12
#> pos name j_nr b_side p_hand height weight dob school wins losses era
#> <chr> <chr> <chr> <chr> <chr> <chr> <int> <chr> <chr> <int> <int> <chr>
#> 1 Outf… AJ R… "17" R R "5' 1… 195 2002… Ohio NA NA <NA>
#> 2 Pitc… Ahma… "41" R R "6' 4… 240 2001… Michi… 1 1 6.14
#> 3 Pitc… Alex… "48" L L "6' 4… 211 2001… Cinci… 0 0 12.60
#> 4 Catc… Andr… "9" L R "5' 1… 195 2002… Pensa… NA NA <NA>
#> 5 Pitc… Andr… "33" R R "5' 1… 190 2000… Texas… 1 1 1.80
#> 6 Infi… Brae… "4" R R "6' 3… 200 2000… Young… NA NA <NA>
#> 7 Outf… Bran… "32" L R "6' 8… 255 2000… Stets… NA NA <NA>
#> 8 Pitc… Bren… "52" R R "6' 4… 195 2003… South… 0 0 6.00
#> 9 Pitc… Cale… "" R R "6' 1… 205 2003… San J… NA NA <NA>
#> 10 Pitc… Cam … "46" L L "6' 0… 202 2000… Washi… 0 0 0.00
#> # ℹ 31 more rows
fromJSON(url_coaches, simplifyVector = FALSE)[["roster"]] %>%
map(~ list(title = pluck(.x, "title"),
name = pluck(.x, "person", "fullName"))) %>%
bind_rows()
#> # A tibble: 4 × 2
#> title name
#> <chr> <chr>
#> 1 Manager Dmitri Young
#> 2 Hitting Coach Bryant Nelson
#> 3 Pitching Coach Ray King
#> 4 Assistant Coach Craig Antush
<sup>Created on 2023-06-13 with reprex v2.0.2</sup>
You might also want to check baseballr package.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论