Rvest提取空表格

huangapple go评论97阅读模式
英文:

Rvest Pulls Empty Tables

问题

代码部分不要翻译,只返回翻译好的部分:

数据抓取网站已更改,我在尝试将数据提取到表格格式时遇到了问题。我尝试了两种不同的代码,试图获取表格,但返回的是空白而不是表格。

对于数据抓取,我是新手,希望得到团队的专业建议。我应该在rvest中寻找其他解决方案,还是尝试学习类似rSelenium的程序?

抓取多个链接

  1. library("dplyr")
  2. library("purr")
  3. library("rvest")
  4. df23 <- expand.grid(
  5. stat_id = c("02568","02674", "02567", "02564", "101")
  6. ) %>%
  7. mutate(
  8. links = paste0(
  9. 'https://www.pgatour.com/stats/detail/',
  10. stat_id
  11. )
  12. ) %>%
  13. as_tibble()
  14. # 用stat_id替换了tournament_id
  15. get_info <- function(link, stat_id){
  16. data <- link %>%
  17. read_html() %>%
  18. html_table() %>%
  19. .[[2]]
  20. }
  21. test_main_stats <- df23 %>%
  22. mutate(tables = map2(links, stat_id, possibly(get_info, otherwise = tibble())))
  23. test_main_stats <- test_main_stats %>%
  24. unnest(everything())

替代代码

  1. url <- read_html("https://www.pgatour.com/stats/detail/02568")
  2. test1 <- url %>%
  3. html_nodes(".css-8atqhb") %>%
  4. html_table

希望这对你有帮助。

英文:

The site I use to scrape data has changed and I'm having issues pulling the data into table format. I used two different types of codes below trying to get the tables, but it is returning blanks instead of tables.

I'm a novice in regards to scraping and would appreciate the expertise of the group. Should I look for other solutions in rvest, or try to learn a program like rSelenium?

https://www.pgatour.com/stats/detail/02675

Scrape for Multiple Links

  1. library(&quot;dplyr&quot;)
  2. library(&quot;purr&quot;)
  3. library(&quot;rvest&quot;)
  4. df23 &lt;- expand.grid(
  5. stat_id = c(&quot;02568&quot;,&quot;02674&quot;, &quot;02567&quot;, &quot;02564&quot;, &quot;101&quot;)
  6. ) %&gt;%
  7. mutate(
  8. links = paste0(
  9. &#39;https://www.pgatour.com/stats/detail/&#39;,
  10. stat_id
  11. )
  12. ) %&gt;%
  13. as_tibble()
  14. #replaced tournament_id with stat_id
  15. get_info &lt;- function(link, stat_id){
  16. data &lt;- link %&gt;%
  17. read_html() %&gt;%
  18. html_table() %&gt;%
  19. .[[2]]
  20. }
  21. test_main_stats &lt;- df23 %&gt;%
  22. mutate(tables = map2(links, stat_id, possibly(get_info, otherwise = tibble())))
  23. test_main_stats &lt;- test_main_stats %&gt;%
  24. unnest(everything())

Alternative Code

  1. url &lt;- read_html(&quot;https://www.pgatour.com/stats/detail/02568&quot;)
  2. test1 &lt;- url %&gt;%
  3. html_nodes(&quot;.css-8atqhb&quot;) %&gt;%
  4. html_table

答案1

得分: 1

这个页面使用JavaScript创建表格,所以rvest无法直接使用。但是如果你查看页面的源代码,你会发现所有的数据都以JSON格式存储在一个“ :?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定