使用Rvest爬取网页内容并将其放入表格中。

huangapple go评论60阅读模式
英文:

Scrape a webpage and put into a table form using Rvest

问题

I am trying to test scraping card grades and putting into columns of each grade from this page:
https://www.psacard.com/pop/basketball-cards/1986/fleer/36766

The only way I can pick anything up is with this code but I've tried html_table() as well and it doesn't pick anything up.

read_html("https://www.psacard.com/pop/basketball-cards/1986/fleer/36766") %>%
  html_text2() %>%
  .[1]

I'm hoping to have a data frame with columns for each grade 1-10 for each player as a row.

英文:

I am trying to test scraping card grades and putting into columns of each grade from this page:
https://www.psacard.com/pop/basketball-cards/1986/fleer/36766

The only way I can pick anything up is with this code but I've tried html_table() as well and it doesn't pick anything up.

read_html("https://www.psacard.com/pop/basketball-cards/1986/fleer/36766") %>% 
  html_text2() %>% 
  .[1]

I'm hoping to have a data frame with columns for each grade 1-10 for each player as a row.

答案1

得分: 2

这是R代码的一部分,包括导入库以及一些请求和数据处理操作。以下是这段代码的翻译:

library(tidyverse)
library(httr2)

"https://www.psacard.com/Pop/GetSetItems" %>%
  request() %>%
  req_headers(Accept = "application/json") %>%
  req_body_form(
    draw = 1,
    start = 0,
    length = 300,
    headingID = 36766,
    categoryID = 20019,
    isPSADNA = "false"
  ) %>%
  req_perform() %>%
  resp_body_json(simplifyVector = TRUE) %>%
  pluck("data") %>%
  as_tibble()

# 一个数据框:133 行 × 39 列
# 列名如下:SpecID, SubjectName, SortOrder, Variety, CardNumber, CardNumberSort, Grade1_5Q, Grade1_5, Grade1, Grade2_5, Grade2, Grade3_5, Grade3, Grade4_5, Grade4, Grade5Q, Grade5, Grade5_5,
# Grade6Q, Grade6, Grade6_5, Grade7Q, Grade7, Grade7_5, Grade8Q, Grade8, Grade8_5, Grade9Q, Grade9, Grade10, Total, GradeTotal, HalfGradeTotal, QualifiedGradeTotal, abbreviated

请注意,这段代码主要用于从指定网址请求数据并将其处理成数据框。没有翻译的部分包括R代码中的函数和变量名。

英文:
library(tidyverse)
library(httr2)

"https://www.psacard.com/Pop/GetSetItems" %>%
  request() %>%
  req_headers(Accept = "application/json") %>%
  req_body_form(
    draw = 1,
    start = 0,
    length = 300,
    headingID = 36766,
    categoryID = 20019,
    isPSADNA = "false"
  ) %>%
  req_perform() %>% 
  resp_body_json(simplifyVector = TRUE) %>% 
  pluck("data") %>% 
  as_tibble()

# A tibble: 133 × 39
   SpecID SubjectN…¹ SortO…² Variety CardN…³ CardN…⁴ GradeN0 Grade1Q Grade1 Grade…⁵ Grade…⁶ Grade2Q Grade2 Grade…⁷ Grade3Q Grade3 Grade…⁸ Grade4Q Grade4 Grade…⁹ Grade5Q Grade5
    <int> <chr>        <dbl> <chr>   <chr>     <int>   <int>   <int>  <int>   <int>   <int>   <int>  <int>   <int>   <int>  <int>   <int>   <int>  <int>   <int>   <int>  <int>
 1      0 TOTAL POP…       0 NA      NA           NA     653       8    262       3     116       3    440      36       2    952      59       6   2809     114      10   6273
 2 299514 Kareem Ab…    1146 ""      1             1       5       0      1       0       0       0      3       1       0     22       1       0     80       0       0    209
 3 299516 Alvan Ada…    2570 ""      2             2       0       0      0       0       0       0      0       0       0      4       0       0     13       0       0     27
 4 299517 Mark Agui…    5348 ""      3             3       0       0      0       0       0       0      1       0       0      2       0       0     11       0       0     30
 5 299518 Danny Ain…    6400 ""      4             4       0       0      0       0       0       0      2       0       0      3       0       0     10       0       0     24
 6 299519 John Bagl…    8378 ""      5             5       0       0      0       0       0       0      0       0       0      0       0       0      4       1       0     17
 7 299520 Thurl Bai…   10567 ""      6             6       0       0      0       0       0       0      0       0       0      0       0       0      7       0       0     22
 8 299521 Charles B…   10922 ""      7             7       5       0      6       0       3       0     26       2       0     39       2       0    129       4       0    264
 9 299524 Benoit Be…   12281 ""      8             8       0       0      0       0       0       0      0       0       0      0       0       0      9       0       0     16
10 299525 Larry Bird   14279 ""      9             9       0       0      3       0       2       0      4       1       0     16       0       0     46       0       0    131
# … with 123 more rows, 17 more variables: Grade5_5 <int>, Grade6Q <int>, Grade6 <int>, Grade6_5 <int>, Grade7Q <int>, Grade7 <int>, Grade7_5 <int>, Grade8Q <int>,
#   Grade8 <int>, Grade8_5 <int>, Grade9Q <int>, Grade9 <int>, Grade10 <int>, Total <int>, GradeTotal <int>, HalfGradeTotal <int>, QualifiedGradeTotal <int>, and abbreviated
#   variable names ¹​SubjectName, ²​SortOrder, ³​CardNumber, ⁴​CardNumberSort, ⁵​Grade1_5Q, ⁶​Grade1_5, ⁷​Grade2_5, ⁸​Grade3_5, ⁹​Grade4_5
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

huangapple
  • 本文由 发表于 2023年3月4日 07:38:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/75632724.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定