Rvest提交表单后继续导航

huangapple go评论64阅读模式
英文:

Rvest continue navigating after submitting a form

问题

Suppose I want to use rvest to search Google. I can do that using the code below.

url <- 'https://www.google.com/'

search_parameters <-
  list('q' = 'dogs')

search_results <- 
  rvest::session(url) %>|
  rvest::html_form() %>| 
  purrr::pluck(1) %>| 
  rvest::html_form_set(!!!search_parameters) %>| 
  rvest::html_form_submit()
#> Submitting with 'btnG'

search_results$status_code
#> [1] 200

However, I can't figure out how to navigate to the first link of the results because html_form_submit() doesn't return a session object.


search_parameters %>|
  rvest::session_follow_link(1)
#> Error in `check_session()`:
#> ! `x` must be produced by session()

#> Backtrace:
#>     x
#>  1. \-rvest::session_follow_link(search_parameters, 1)
#>  2.   \-rvest:::check_session(x)
#>  3.     \-rlang::abort("x must be produced by session()")

I know I could just create a new session for the example above, but that doesn't work if I need to log in to a site first. Is there a way to use the same session object to continue navigating?

英文:

Suppose I want to use rvest to search Google. I can do that using the code below.

url &lt;- &#39;https://www.google.com/&#39;

search_parameters &lt;-
  list(&#39;q&#39; = &#39;dogs&#39;)

search_results &lt;- 
  rvest::session(url) |&gt;
  rvest::html_form() |&gt; 
  purrr::pluck(1) |&gt; 
  rvest::html_form_set(!!!search_parameters) |&gt; 
  rvest::html_form_submit()
#&gt; Submitting with &#39;btnG&#39;

search_results$status_code
#&gt; [1] 200

However, I can't figure out how to navigate to the first link of the results because html_form_submit() doesn't return a session object.


search_parameters |&gt;
  rvest::session_follow_link(1)
#&gt; Error in `check_session()`:
#&gt; ! `x` must be produced by session()

#&gt; Backtrace:
#&gt;     x
#&gt;  1. \-rvest::session_follow_link(search_parameters, 1)
#&gt;  2.   \-rvest:::check_session(x)
#&gt;  3.     \-rlang::abort(&quot;`x` must be produced by session()&quot;)

I know I could just create a new session for the example above, but that doesn't work if I need to log in to a site first. Is there a way to use the same session object to continue navigating?

答案1

得分: 1

你可能正在寻找session_submit()函数:

url <- 'https://www.google.com/'

search_parameters <-
  list('q' = 'dogs')

s <- rvest::session(url)

s <- 
  rvest::html_form(s) %> 
  purrr::pluck(1) %> 
  rvest::html_form_set(!!!search_parameters) %> 
  rvest::session_submit(s, form = _) 
#> 使用 'btnG' 提交

s |>
  rvest::session_follow_link(1)
#> 导航至
#> https://accounts.google.com/ServiceLogin?...
#> <session> https://accounts.google.com/v3/signin/identifier?...
#>   状态: 200
#>   类型:   text/html; charset=utf-8
#>   大小:   555260

创建于 2023-06-01,使用 reprex v2.0.2

英文:

You are probably looking for session_submit():

url &lt;- &#39;https://www.google.com/&#39;

search_parameters &lt;-
  list(&#39;q&#39; = &#39;dogs&#39;)

s &lt;- rvest::session(url)

s &lt;- 
  rvest::html_form(s) |&gt; 
  purrr::pluck(1) |&gt; 
  rvest::html_form_set(!!!search_parameters) |&gt; 
  rvest::session_submit(s, form = _) 
#&gt; Submitting with &#39;btnG&#39;

s |&gt;
  rvest::session_follow_link(1)
#&gt; Navigating to
#&gt; https://accounts.google.com/ServiceLogin?...
#&gt; &lt;session&gt; https://accounts.google.com/v3/signin/identifier?...
#&gt;   Status: 200
#&gt;   Type:   text/html; charset=utf-8
#&gt;   Size:   555260

<sup>Created on 2023-06-01 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年6月1日 03:09:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76376582.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定