Rvest表格返回空白

huangapple go评论98阅读模式
英文:

Rvest Tables Returning Empty

问题

我正在尝试从以下链接中提取表格:https://www.mlbdraftleague.com/mahoning-valley/roster

  1. library(rvest)
  2. library(magrittr)
  3. url <- "https://www.mlbdraftleague.com/mahoning-valley/roster"
  4. page <- read_html(url) %>%
  5. html_table(fill = T)

我尝试了这样做,它返回了空的数据框(DataFrames),数据框的数量正确(5),列数也正确,但数据框是空的。非常感谢所有的帮助。

英文:

I am trying to scrape the tables from the following link: https://www.mlbdraftleague.com/mahoning-valley/roster

  1. library(rvest)
  2. library(magrittr)
  3. url &lt;- &quot;https://www.mlbdraftleague.com/mahoning-valley/roster&quot;
  4. page &lt;- read_html(url) %&gt;%
  5. html_table(fill = T)

I tried that and it returned empty Dfs with the right amount of tables (5) and the right amount of columns, but the dataframes are empty. All help is appreciated.

答案1

得分: 1

以下是您提供的代码的翻译部分:

  1. library(tidyverse)
  2. library(httr2)
  3. "https://statsapi.mlb.com/api/v1/teams/545/roster?hydrate=person(rosterEntries,education,stats(type=season,season=2023,sportId=22,teamId=545))&rosterType=active&season=2023&sportId=22" %>%
  4. request() %>%
  5. req_perform() %>%
  6. resp_body_json(simplifyVector = TRUE) %>%
  7. pluck("roster") %>%
  8. unnest(everything(), names_sep = "_")
  9. # A tibble: 41 × 47
  10. person_id person_full_name person_link person_first_name person_last_name person_birth_date person_current_age person_birth_city person_birth_state_place
  11. <int> <chr> <chr> <chr> <chr> <chr> <int> <chr> <chr>
  12. 1 813834 AJ Rausch /api/v1/peop AJ Rausch 2002-03-19 21 Powell OH
  13. 2 800677 Ahmad Harajli /api/v1/peop Ahmad Harajli 2001-08-31 21 Dearborn MI
  14. 3 701144 Alex Shea /api/v1/peop Brian Shea 2001-05-04 22 Union KY
  15. 4 701475 Andreaus Lewis /api/v1/peop Andreaus Lewis 2002-12-10 20 Atlanta GA
  16. 5 701499 Andrew Lucas /api/v1/peop Andrew Lucas 2000-02-04 23 Camarillo CA
  17. 6 813836 Braeden O'Shaughnessy /api/v1/peop… Braeden O'Shaughnessy 2000-11-19 22 Poland OH
  18. 7 681376 Brandon Hylton /api/v1/peop Brandon Hylton 2000-02-01 23 Livingston NJ
  19. 8 695480 Brennyn Abendroth /api/v1/peop Brennyn Abendroth 2003-06-07 20 Effingham IL
  20. 9 695746 Cale Lansville /api/v1/peop Cale Lansville 2003-01-06 20 Englewood CO
  21. 10 809953 Cam Liss /api/v1/peop Cameron Liss 2000-04-15 23 Spokane WA
  22. # ℹ 31 more rows

请注意,我只翻译了代码部分,没有包括问题或其他内容。如果您需要进一步的翻译或有其他问题,请告诉我。

英文:
  1. library(tidyverse)
  2. library(httr2)
  3. &quot;https://statsapi.mlb.com/api/v1/teams/545/roster?hydrate=person(rosterEntries,education,stats(type=season,season=2023,sportId=22,teamId=545))&amp;rosterType=active&amp;season=2023&amp;sportId=22&quot; %&gt;%
  4. request() %&gt;%
  5. req_perform() %&gt;%
  6. resp_body_json(simplifyVector = TRUE) %&gt;%
  7. pluck(&quot;roster&quot;) %&gt;%
  8. unnest(everything(), names_sep = &quot;_&quot;)
  9. # A tibble: 41 &#215; 47
  10. person_id person_full_name person_link person_first_name person_last_name person_birth_date person_current_age person_birth_city person_birth_state_p…&#185;
  11. &lt;int&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;int&gt; &lt;chr&gt; &lt;chr&gt;
  12. 1 813834 AJ Rausch /api/v1/peop AJ Rausch 2002-03-19 21 Powell OH
  13. 2 800677 Ahmad Harajli /api/v1/peop Ahmad Harajli 2001-08-31 21 Dearborn MI
  14. 3 701144 Alex Shea /api/v1/peop Brian Shea 2001-05-04 22 Union KY
  15. 4 701475 Andreaus Lewis /api/v1/peop Andreaus Lewis 2002-12-10 20 Atlanta GA
  16. 5 701499 Andrew Lucas /api/v1/peop Andrew Lucas 2000-02-04 23 Camarillo CA
  17. 6 813836 Braeden O&#39;Shaughnessy /api/v1/peop Braeden O&#39;Shaughnessy 2000-11-19 22 Poland OH
  18. 7 681376 Brandon Hylton /api/v1/peop Brandon Hylton 2000-02-01 23 Livingston NJ
  19. 8 695480 Brennyn Abendroth /api/v1/peop Brennyn Abendroth 2003-06-07 20 Effingham IL
  20. 9 695746 Cale Lansville /api/v1/peop Cale Lansville 2003-01-06 20 Englewood CO
  21. 10 809953 Cam Liss /api/v1/peop Cameron Liss 2000-04-15 23 Spokane WA
  22. # ℹ 31 more rows

答案2

得分: 0

以下是代码的翻译部分:

其中一种选项是从嵌套列表中逐个提取相关列,尽管只有很少的列时更有意义,就像教练们一样:

  1. library(dplyr)
  2. library(jsonlite)
  3. library(purrr)
  4. url_roster <- "https://statsapi.mlb.com/api/v1/teams/545/roster?hydrate=person(rosterEntries,education,stats(type=season,season=2023,sportId=22,teamId=545))&rosterType=active&season=2023&sportId=22"
  5. url_coaches <- "https://statsapi.mlb.com/api/v1/teams/545/coaches?hydrate=person&rosterType=active&sportId=22&season=2023"
  6. fromJSON(url_roster, simplifyVector = FALSE)[["roster"]] %>%
  7. map(~ list(pos = pluck(.x, "person", "primaryPosition", "type"),
  8. name = pluck(.x, "person", "fullName"),
  9. j_nr = pluck(.x, "jerseyNumber"),
  10. wins = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "wins"),
  11. losses = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "losses"),
  12. era = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "era"),
  13. b_side = pluck(.x, "person", "batSide", "code"),
  14. p_hand = pluck(.x, "person", "pitchHand", "code"),
  15. height = pluck(.x, "person", "height"),
  16. weight = pluck(.x, "person", "weight"),
  17. dob = pluck(.x, "person", "birthDate"),
  18. school = pluck(.x, "person", "education", "colleges", 1, "name")
  19. )) %>%
  20. bind_rows()
  21. fromJSON(url_coaches, simplifyVector = FALSE)[["roster"]] %>%
  22. map(~ list(title = pluck(.x, "title"),
  23. name = pluck(.x, "person", "fullName"))) %>%
  24. bind_rows()

希望这对您有所帮助。如果您有其他问题,请随时提问。

英文:

One of the options is plucking relevant columns one by one from a nested list, though it makes more sense when there are just a few / fewer of those, like with coaches :

  1. library(dplyr)
  2. library(jsonlite)
  3. library(purrr)
  4. url_roster &lt;- &quot;https://statsapi.mlb.com/api/v1/teams/545/roster?hydrate=person(rosterEntries,education,stats(type=season,season=2023,sportId=22,teamId=545))&amp;rosterType=active&amp;season=2023&amp;sportId=22&quot;
  5. url_coaches &lt;- &quot;https://statsapi.mlb.com/api/v1/teams/545/coaches?hydrate=person&amp;rosterType=active&amp;sportId=22&amp;season=2023&quot;
  6. fromJSON(url_roster, simplifyVector = FALSE)[[&quot;roster&quot;]] %&gt;%
  7. map(~ list(pos = pluck(.x, &quot;person&quot;, &quot;primaryPosition&quot;, &quot;type&quot;),
  8. name = pluck(.x, &quot;person&quot;, &quot;fullName&quot;),
  9. j_nr = pluck(.x, &quot;jerseyNumber&quot;),
  10. wins = pluck(.x, &quot;person&quot;, &quot;stats&quot;, 1, &quot;splits&quot;, 1, &quot;stat&quot;, &quot;wins&quot;),
  11. losses = pluck(.x, &quot;person&quot;, &quot;stats&quot;, 1, &quot;splits&quot;, 1, &quot;stat&quot;, &quot;losses&quot;),
  12. era = pluck(.x, &quot;person&quot;, &quot;stats&quot;, 1, &quot;splits&quot;, 1, &quot;stat&quot;, &quot;era&quot;),
  13. b_side = pluck(.x, &quot;person&quot;, &quot;batSide&quot;, &quot;code&quot;),
  14. p_hand = pluck(.x, &quot;person&quot;, &quot;pitchHand&quot;, &quot;code&quot;),
  15. height = pluck(.x, &quot;person&quot;, &quot;height&quot;),
  16. weight = pluck(.x, &quot;person&quot;, &quot;weight&quot;),
  17. dob = pluck(.x, &quot;person&quot;, &quot;birthDate&quot;),
  18. school = pluck(.x, &quot;person&quot;, &quot;education&quot;, &quot;colleges&quot;, 1, &quot;name&quot;)
  19. )) %&gt;%
  20. bind_rows()
  21. #&gt; # A tibble: 41 &#215; 12
  22. #&gt; pos name j_nr b_side p_hand height weight dob school wins losses era
  23. #&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;int&gt; &lt;chr&gt; &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;chr&gt;
  24. #&gt; 1 Outf… AJ R… &quot;17&quot; R R &quot;5&#39; 1… 195 2002… Ohio NA NA &lt;NA&gt;
  25. #&gt; 2 Pitc… Ahma… &quot;41&quot; R R &quot;6&#39; 4… 240 2001… Michi… 1 1 6.14
  26. #&gt; 3 Pitc… Alex… &quot;48&quot; L L &quot;6&#39; 4… 211 2001… Cinci… 0 0 12.60
  27. #&gt; 4 Catc… Andr… &quot;9&quot; L R &quot;5&#39; 1… 195 2002… Pensa… NA NA &lt;NA&gt;
  28. #&gt; 5 Pitc… Andr… &quot;33&quot; R R &quot;5&#39; 1… 190 2000… Texas… 1 1 1.80
  29. #&gt; 6 Infi… Brae… &quot;4&quot; R R &quot;6&#39; 3… 200 2000… Young… NA NA &lt;NA&gt;
  30. #&gt; 7 Outf… Bran… &quot;32&quot; L R &quot;6&#39; 8… 255 2000… Stets… NA NA &lt;NA&gt;
  31. #&gt; 8 Pitc… Bren… &quot;52&quot; R R &quot;6&#39; 4… 195 2003… South… 0 0 6.00
  32. #&gt; 9 Pitc… Cale… &quot;&quot; R R &quot;6&#39; 1… 205 2003… San J… NA NA &lt;NA&gt;
  33. #&gt; 10 Pitc… Cam … &quot;46&quot; L L &quot;6&#39; 0… 202 2000… Washi… 0 0 0.00
  34. #&gt; # ℹ 31 more rows
  35. fromJSON(url_coaches, simplifyVector = FALSE)[[&quot;roster&quot;]] %&gt;%
  36. map(~ list(title = pluck(.x, &quot;title&quot;),
  37. name = pluck(.x, &quot;person&quot;, &quot;fullName&quot;))) %&gt;%
  38. bind_rows()
  39. #&gt; # A tibble: 4 &#215; 2
  40. #&gt; title name
  41. #&gt; &lt;chr&gt; &lt;chr&gt;
  42. #&gt; 1 Manager Dmitri Young
  43. #&gt; 2 Hitting Coach Bryant Nelson
  44. #&gt; 3 Pitching Coach Ray King
  45. #&gt; 4 Assistant Coach Craig Antush

<sup>Created on 2023-06-13 with reprex v2.0.2</sup>

You might also want to check baseballr package.

huangapple
  • 本文由 发表于 2023年6月13日 14:04:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76462071.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定