Rvest提取空表格

huangapple go评论64阅读模式
英文:

Rvest Pulls Empty Tables

问题

代码部分不要翻译,只返回翻译好的部分:

数据抓取网站已更改,我在尝试将数据提取到表格格式时遇到了问题。我尝试了两种不同的代码,试图获取表格,但返回的是空白而不是表格。

对于数据抓取,我是新手,希望得到团队的专业建议。我应该在rvest中寻找其他解决方案,还是尝试学习类似rSelenium的程序?

抓取多个链接

library("dplyr")
library("purr")
library("rvest")

df23 <- expand.grid(
  stat_id = c("02568","02674", "02567", "02564", "101")  
) %>%
  mutate(
    links = paste0(
      'https://www.pgatour.com/stats/detail/',
      stat_id
    )
  ) %>%
  as_tibble()

# 用stat_id替换了tournament_id
get_info <- function(link, stat_id){
  data <- link %>%
    read_html() %>%
    html_table() %>%
    .[[2]] 
}

test_main_stats <- df23 %>%
  mutate(tables = map2(links, stat_id, possibly(get_info, otherwise = tibble())))

test_main_stats <- test_main_stats %>% 
  unnest(everything())

替代代码

url <- read_html("https://www.pgatour.com/stats/detail/02568")
test1 <- url %>%
  html_nodes(".css-8atqhb") %>%
  html_table

希望这对你有帮助。

英文:

The site I use to scrape data has changed and I'm having issues pulling the data into table format. I used two different types of codes below trying to get the tables, but it is returning blanks instead of tables.

I'm a novice in regards to scraping and would appreciate the expertise of the group. Should I look for other solutions in rvest, or try to learn a program like rSelenium?

https://www.pgatour.com/stats/detail/02675

Scrape for Multiple Links

library(&quot;dplyr&quot;)
library(&quot;purr&quot;)
library(&quot;rvest&quot;)

df23 &lt;- expand.grid(
  stat_id = c(&quot;02568&quot;,&quot;02674&quot;, &quot;02567&quot;, &quot;02564&quot;, &quot;101&quot;)  
) %&gt;% 
  mutate(
    links = paste0(
      &#39;https://www.pgatour.com/stats/detail/&#39;,
      stat_id
    )
  ) %&gt;% 
  as_tibble()

#replaced tournament_id with stat_id
get_info &lt;- function(link, stat_id){
  data &lt;- link %&gt;%
    read_html() %&gt;%
    html_table() %&gt;%
    .[[2]] 
}

test_main_stats &lt;- df23 %&gt;%
  mutate(tables = map2(links, stat_id, possibly(get_info, otherwise = tibble())))

test_main_stats &lt;- test_main_stats %&gt;% 
  unnest(everything())

Alternative Code

url &lt;- read_html(&quot;https://www.pgatour.com/stats/detail/02568&quot;)
test1 &lt;- url %&gt;%
  html_nodes(&quot;.css-8atqhb&quot;) %&gt;%
  html_table

答案1

得分: 1

这个页面使用JavaScript创建表格,所以rvest无法直接使用。但是如果你查看页面的源代码,你会发现所有的数据都以JSON格式存储在一个“ :?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定