Rvest提交表单后继续导航

huangapple go评论85阅读模式
英文:

Rvest continue navigating after submitting a form

问题

Suppose I want to use rvest to search Google. I can do that using the code below.

  1. url <- 'https://www.google.com/'
  2. search_parameters <-
  3. list('q' = 'dogs')
  4. search_results <-
  5. rvest::session(url) %>|
  6. rvest::html_form() %>|
  7. purrr::pluck(1) %>|
  8. rvest::html_form_set(!!!search_parameters) %>|
  9. rvest::html_form_submit()
  10. #> Submitting with 'btnG'
  11. search_results$status_code
  12. #> [1] 200

However, I can't figure out how to navigate to the first link of the results because html_form_submit() doesn't return a session object.

  1. search_parameters %>|
  2. rvest::session_follow_link(1)
  3. #> Error in `check_session()`:
  4. #> ! `x` must be produced by session()
  5. #> Backtrace:
  6. #> x
  7. #> 1. \-rvest::session_follow_link(search_parameters, 1)
  8. #> 2. \-rvest:::check_session(x)
  9. #> 3. \-rlang::abort("x must be produced by session()")

I know I could just create a new session for the example above, but that doesn't work if I need to log in to a site first. Is there a way to use the same session object to continue navigating?

英文:

Suppose I want to use rvest to search Google. I can do that using the code below.

  1. url &lt;- &#39;https://www.google.com/&#39;
  2. search_parameters &lt;-
  3. list(&#39;q&#39; = &#39;dogs&#39;)
  4. search_results &lt;-
  5. rvest::session(url) |&gt;
  6. rvest::html_form() |&gt;
  7. purrr::pluck(1) |&gt;
  8. rvest::html_form_set(!!!search_parameters) |&gt;
  9. rvest::html_form_submit()
  10. #&gt; Submitting with &#39;btnG&#39;
  11. search_results$status_code
  12. #&gt; [1] 200

However, I can't figure out how to navigate to the first link of the results because html_form_submit() doesn't return a session object.

  1. search_parameters |&gt;
  2. rvest::session_follow_link(1)
  3. #&gt; Error in `check_session()`:
  4. #&gt; ! `x` must be produced by session()
  5. #&gt; Backtrace:
  6. #&gt; x
  7. #&gt; 1. \-rvest::session_follow_link(search_parameters, 1)
  8. #&gt; 2. \-rvest:::check_session(x)
  9. #&gt; 3. \-rlang::abort(&quot;`x` must be produced by session()&quot;)

I know I could just create a new session for the example above, but that doesn't work if I need to log in to a site first. Is there a way to use the same session object to continue navigating?

答案1

得分: 1

你可能正在寻找session_submit()函数:

  1. url <- 'https://www.google.com/'
  2. search_parameters <-
  3. list('q' = 'dogs')
  4. s <- rvest::session(url)
  5. s <-
  6. rvest::html_form(s) %>
  7. purrr::pluck(1) %>
  8. rvest::html_form_set(!!!search_parameters) %>
  9. rvest::session_submit(s, form = _)
  10. #> 使用 'btnG' 提交
  11. s |>
  12. rvest::session_follow_link(1)
  13. #> 导航至
  14. #> https://accounts.google.com/ServiceLogin?...
  15. #> <session> https://accounts.google.com/v3/signin/identifier?...
  16. #> 状态: 200
  17. #> 类型: text/html; charset=utf-8
  18. #> 大小: 555260

创建于 2023-06-01,使用 reprex v2.0.2

英文:

You are probably looking for session_submit():

  1. url &lt;- &#39;https://www.google.com/&#39;
  2. search_parameters &lt;-
  3. list(&#39;q&#39; = &#39;dogs&#39;)
  4. s &lt;- rvest::session(url)
  5. s &lt;-
  6. rvest::html_form(s) |&gt;
  7. purrr::pluck(1) |&gt;
  8. rvest::html_form_set(!!!search_parameters) |&gt;
  9. rvest::session_submit(s, form = _)
  10. #&gt; Submitting with &#39;btnG&#39;
  11. s |&gt;
  12. rvest::session_follow_link(1)
  13. #&gt; Navigating to
  14. #&gt; https://accounts.google.com/ServiceLogin?...
  15. #&gt; &lt;session&gt; https://accounts.google.com/v3/signin/identifier?...
  16. #&gt; Status: 200
  17. #&gt; Type: text/html; charset=utf-8
  18. #&gt; Size: 555260

<sup>Created on 2023-06-01 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年6月1日 03:09:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76376582.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定