使用Rvest爬取网页内容并将其放入表格中。

huangapple go评论91阅读模式
英文:

Scrape a webpage and put into a table form using Rvest

问题

I am trying to test scraping card grades and putting into columns of each grade from this page:
https://www.psacard.com/pop/basketball-cards/1986/fleer/36766

The only way I can pick anything up is with this code but I've tried html_table() as well and it doesn't pick anything up.

  1. read_html("https://www.psacard.com/pop/basketball-cards/1986/fleer/36766") %>%
  2. html_text2() %>%
  3. .[1]

I'm hoping to have a data frame with columns for each grade 1-10 for each player as a row.

英文:

I am trying to test scraping card grades and putting into columns of each grade from this page:
https://www.psacard.com/pop/basketball-cards/1986/fleer/36766

The only way I can pick anything up is with this code but I've tried html_table() as well and it doesn't pick anything up.

  1. read_html("https://www.psacard.com/pop/basketball-cards/1986/fleer/36766") %>%
  2. html_text2() %>%
  3. .[1]

I'm hoping to have a data frame with columns for each grade 1-10 for each player as a row.

答案1

得分: 2

这是R代码的一部分,包括导入库以及一些请求和数据处理操作。以下是这段代码的翻译:

  1. library(tidyverse)
  2. library(httr2)
  3. "https://www.psacard.com/Pop/GetSetItems" %>%
  4. request() %>%
  5. req_headers(Accept = "application/json") %>%
  6. req_body_form(
  7. draw = 1,
  8. start = 0,
  9. length = 300,
  10. headingID = 36766,
  11. categoryID = 20019,
  12. isPSADNA = "false"
  13. ) %>%
  14. req_perform() %>%
  15. resp_body_json(simplifyVector = TRUE) %>%
  16. pluck("data") %>%
  17. as_tibble()
  18. # 一个数据框:133 行 × 39 列
  19. # 列名如下:SpecID, SubjectName, SortOrder, Variety, CardNumber, CardNumberSort, Grade1_5Q, Grade1_5, Grade1, Grade2_5, Grade2, Grade3_5, Grade3, Grade4_5, Grade4, Grade5Q, Grade5, Grade5_5,
  20. # Grade6Q, Grade6, Grade6_5, Grade7Q, Grade7, Grade7_5, Grade8Q, Grade8, Grade8_5, Grade9Q, Grade9, Grade10, Total, GradeTotal, HalfGradeTotal, QualifiedGradeTotal, abbreviated

请注意,这段代码主要用于从指定网址请求数据并将其处理成数据框。没有翻译的部分包括R代码中的函数和变量名。

英文:
  1. library(tidyverse)
  2. library(httr2)
  3. "https://www.psacard.com/Pop/GetSetItems" %>%
  4. request() %>%
  5. req_headers(Accept = "application/json") %>%
  6. req_body_form(
  7. draw = 1,
  8. start = 0,
  9. length = 300,
  10. headingID = 36766,
  11. categoryID = 20019,
  12. isPSADNA = "false"
  13. ) %>%
  14. req_perform() %>%
  15. resp_body_json(simplifyVector = TRUE) %>%
  16. pluck("data") %>%
  17. as_tibble()
  18. # A tibble: 133 × 39
  19. SpecID SubjectN…¹ SortO…² Variety CardN…³ CardN…⁴ GradeN0 Grade1Q Grade1 Grade…⁵ Grade…⁶ Grade2Q Grade2 Grade…⁷ Grade3Q Grade3 Grade…⁸ Grade4Q Grade4 Grade…⁹ Grade5Q Grade5
  20. <int> <chr> <dbl> <chr> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
  21. 1 0 TOTAL POP 0 NA NA NA 653 8 262 3 116 3 440 36 2 952 59 6 2809 114 10 6273
  22. 2 299514 Kareem Ab 1146 "" 1 1 5 0 1 0 0 0 3 1 0 22 1 0 80 0 0 209
  23. 3 299516 Alvan Ada 2570 "" 2 2 0 0 0 0 0 0 0 0 0 4 0 0 13 0 0 27
  24. 4 299517 Mark Agui 5348 "" 3 3 0 0 0 0 0 0 1 0 0 2 0 0 11 0 0 30
  25. 5 299518 Danny Ain 6400 "" 4 4 0 0 0 0 0 0 2 0 0 3 0 0 10 0 0 24
  26. 6 299519 John Bagl 8378 "" 5 5 0 0 0 0 0 0 0 0 0 0 0 0 4 1 0 17
  27. 7 299520 Thurl Bai 10567 "" 6 6 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 22
  28. 8 299521 Charles B 10922 "" 7 7 5 0 6 0 3 0 26 2 0 39 2 0 129 4 0 264
  29. 9 299524 Benoit Be 12281 "" 8 8 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 16
  30. 10 299525 Larry Bird 14279 "" 9 9 0 0 3 0 2 0 4 1 0 16 0 0 46 0 0 131
  31. # … with 123 more rows, 17 more variables: Grade5_5 <int>, Grade6Q <int>, Grade6 <int>, Grade6_5 <int>, Grade7Q <int>, Grade7 <int>, Grade7_5 <int>, Grade8Q <int>,
  32. # Grade8 <int>, Grade8_5 <int>, Grade9Q <int>, Grade9 <int>, Grade10 <int>, Total <int>, GradeTotal <int>, HalfGradeTotal <int>, QualifiedGradeTotal <int>, and abbreviated
  33. # variable names ¹​SubjectName, ²​SortOrder, ³​CardNumber, ⁴​CardNumberSort, ⁵​Grade1_5Q, ⁶​Grade1_5, ⁷​Grade2_5, ⁸​Grade3_5, ⁹​Grade4_5
  34. # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

huangapple
  • 本文由 发表于 2023年3月4日 07:38:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/75632724.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定