英文:
Scrape a webpage and put into a table form using Rvest
问题
I am trying to test scraping card grades and putting into columns of each grade from this page:
https://www.psacard.com/pop/basketball-cards/1986/fleer/36766
The only way I can pick anything up is with this code but I've tried html_table() as well and it doesn't pick anything up.
read_html("https://www.psacard.com/pop/basketball-cards/1986/fleer/36766") %>%
html_text2() %>%
.[1]
I'm hoping to have a data frame with columns for each grade 1-10 for each player as a row.
英文:
I am trying to test scraping card grades and putting into columns of each grade from this page:
https://www.psacard.com/pop/basketball-cards/1986/fleer/36766
The only way I can pick anything up is with this code but I've tried html_table() as well and it doesn't pick anything up.
read_html("https://www.psacard.com/pop/basketball-cards/1986/fleer/36766") %>%
html_text2() %>%
.[1]
I'm hoping to have a data frame with columns for each grade 1-10 for each player as a row.
答案1
得分: 2
这是R代码的一部分,包括导入库以及一些请求和数据处理操作。以下是这段代码的翻译:
library(tidyverse)
library(httr2)
"https://www.psacard.com/Pop/GetSetItems" %>%
request() %>%
req_headers(Accept = "application/json") %>%
req_body_form(
draw = 1,
start = 0,
length = 300,
headingID = 36766,
categoryID = 20019,
isPSADNA = "false"
) %>%
req_perform() %>%
resp_body_json(simplifyVector = TRUE) %>%
pluck("data") %>%
as_tibble()
# 一个数据框:133 行 × 39 列
# 列名如下:SpecID, SubjectName, SortOrder, Variety, CardNumber, CardNumberSort, Grade1_5Q, Grade1_5, Grade1, Grade2_5, Grade2, Grade3_5, Grade3, Grade4_5, Grade4, Grade5Q, Grade5, Grade5_5,
# Grade6Q, Grade6, Grade6_5, Grade7Q, Grade7, Grade7_5, Grade8Q, Grade8, Grade8_5, Grade9Q, Grade9, Grade10, Total, GradeTotal, HalfGradeTotal, QualifiedGradeTotal, abbreviated
请注意,这段代码主要用于从指定网址请求数据并将其处理成数据框。没有翻译的部分包括R代码中的函数和变量名。
英文:
library(tidyverse)
library(httr2)
"https://www.psacard.com/Pop/GetSetItems" %>%
request() %>%
req_headers(Accept = "application/json") %>%
req_body_form(
draw = 1,
start = 0,
length = 300,
headingID = 36766,
categoryID = 20019,
isPSADNA = "false"
) %>%
req_perform() %>%
resp_body_json(simplifyVector = TRUE) %>%
pluck("data") %>%
as_tibble()
# A tibble: 133 × 39
SpecID SubjectN…¹ SortO…² Variety CardN…³ CardN…⁴ GradeN0 Grade1Q Grade1 Grade…⁵ Grade…⁶ Grade2Q Grade2 Grade…⁷ Grade3Q Grade3 Grade…⁸ Grade4Q Grade4 Grade…⁹ Grade5Q Grade5
<int> <chr> <dbl> <chr> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 0 TOTAL POP… 0 NA NA NA 653 8 262 3 116 3 440 36 2 952 59 6 2809 114 10 6273
2 299514 Kareem Ab… 1146 "" 1 1 5 0 1 0 0 0 3 1 0 22 1 0 80 0 0 209
3 299516 Alvan Ada… 2570 "" 2 2 0 0 0 0 0 0 0 0 0 4 0 0 13 0 0 27
4 299517 Mark Agui… 5348 "" 3 3 0 0 0 0 0 0 1 0 0 2 0 0 11 0 0 30
5 299518 Danny Ain… 6400 "" 4 4 0 0 0 0 0 0 2 0 0 3 0 0 10 0 0 24
6 299519 John Bagl… 8378 "" 5 5 0 0 0 0 0 0 0 0 0 0 0 0 4 1 0 17
7 299520 Thurl Bai… 10567 "" 6 6 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 22
8 299521 Charles B… 10922 "" 7 7 5 0 6 0 3 0 26 2 0 39 2 0 129 4 0 264
9 299524 Benoit Be… 12281 "" 8 8 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 16
10 299525 Larry Bird 14279 "" 9 9 0 0 3 0 2 0 4 1 0 16 0 0 46 0 0 131
# … with 123 more rows, 17 more variables: Grade5_5 <int>, Grade6Q <int>, Grade6 <int>, Grade6_5 <int>, Grade7Q <int>, Grade7 <int>, Grade7_5 <int>, Grade8Q <int>,
# Grade8 <int>, Grade8_5 <int>, Grade9Q <int>, Grade9 <int>, Grade10 <int>, Total <int>, GradeTotal <int>, HalfGradeTotal <int>, QualifiedGradeTotal <int>, and abbreviated
# variable names ¹SubjectName, ²SortOrder, ³CardNumber, ⁴CardNumberSort, ⁵Grade1_5Q, ⁶Grade1_5, ⁷Grade2_5, ⁸Grade3_5, ⁹Grade4_5
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论