R查询维基媒体服务器

huangapple go评论111阅读模式
英文:

R Query of a Wikimedia server

问题

我正在尝试查询Cameo数据库。

如果我使用以下URL https://cameo.mfa.org/api.php?action=query&pageids=17051&prop=extracts&format=json,那么我在线上得到一个有效的输出。

然而,如果我使用:

  1. library(httr)
  2. library(jsonlite)
  3. base_url <- "https://cameo.mfa.org/api.php"
  4. query_param <- list(action = "query",
  5. pageids = "17051",
  6. format = "json",
  7. prop = "extracts"
  8. )
  9. parsed_content <- httr::GET(base_url, query_param)
  10. jsonlite::fromJSON(content(parsed_content, as = "text", encoding = "UTF-8"))

那么 jsonlite 会失败,因为输出是HTML格式而不是JSON。

你对此有什么建议吗?

英文:

I am trying to query the Cameo database.

If I use the URL https://cameo.mfa.org/api.php?action=query&pageids=17051&prop=extracts&format=json, then I get, online, a valid output.

However, if I use:

  1. library(httr)
  2. library(jsonlite)
  3. base_url &lt;- &quot;https://cameo.mfa.org/api.php&quot;
  4. query_param &lt;- list(action = &quot;query&quot;,
  5. pageids = &quot;17051&quot;,
  6. format = &quot;json&quot;,
  7. prop = &quot;extracts&quot;
  8. )
  9. parsed_content &lt;- httr::GET(base_url, query_param)
  10. jsonlite::fromJSON(content(parsed_content, as = &quot;text&quot;, encoding = &quot;UTF-8&quot;))

Then jsonlite fails because the output is in html format and not json.

Do you have any advice on this?

答案1

得分: 2

httr::GET的第二个参数是config=,这不是你应该分配query_param的位置。而应该将其命名为query=query_param

  1. res &lt;- httr::GET(base_url, query = query_param)
  2. res
  3. # 响应 [https://cameo.mfa.org/api.php?action=query&amp;pageids=17051&amp;format=json&amp;prop=extracts]
  4. # 日期:2023年07月03日 15:06
  5. # 状态:200
  6. # 内容类型:application/json; charset=utf-8
  7. # 大小:5.22千字节
  8. str(httr::content(res))
  9. # 3个元素的列表
  10. # $ batchcomplete: 字符串 ""
  11. # $ warnings : 1个列表
  12. # ..$ extracts: 1个列表
  13. # .. ..$ *: 字符串 "HTML可能存在格式错误和/或不平衡,可能省略内联图像。使用需谨慎。已知问题包括li"| 被截断
  14. # $ query : 1个列表
  15. # ..$ pages: 1个列表
  16. # .. ..$ 17051: 4个元素的列表
  17. # .. .. ..$ pageid : 整数 17051
  18. # .. .. ..$ ns : 整数 0
  19. # .. .. ..$ title : 字符串 "Copper"
  20. # .. .. ..$ extract: 字符串 "<h2><span id=\"Description\">Description</span></h2>\n<p>A reddish-brown, ductile, metallic element. Copper is "| 被截断
英文:

The second argument to httr::GET is config=, which is not where you should be assigning query_param. Instead name it as query=query_param.

  1. res &lt;- httr::GET(base_url, query = query_param)
  2. res
  3. # Response [https://cameo.mfa.org/api.php?action=query&amp;pageids=17051&amp;format=json&amp;prop=extracts]
  4. # Date: 2023-07-03 15:06
  5. # Status: 200
  6. # Content-Type: application/json; charset=utf-8
  7. # Size: 5.22 kB
  8. str(httr::content(res))
  9. # List of 3
  10. # $ batchcomplete: chr &quot;&quot;
  11. # $ warnings :List of 1
  12. # ..$ extracts:List of 1
  13. # .. ..$ *: chr &quot;HTML may be malformed and/or unbalanced and may omit inline images. Use at your own risk. Known problems are li&quot;| __truncated__
  14. # $ query :List of 1
  15. # ..$ pages:List of 1
  16. # .. ..$ 17051:List of 4
  17. # .. .. ..$ pageid : int 17051
  18. # .. .. ..$ ns : int 0
  19. # .. .. ..$ title : chr &quot;Copper&quot;
  20. # .. .. ..$ extract: chr &quot;&lt;h2&gt;&lt;span id=\&quot;Description\&quot;&gt;Description&lt;/span&gt;&lt;/h2&gt;\n&lt;p&gt;A reddish-brown, ductile, metallic element. Copper is &quot;| __truncated__

答案2

得分: 1

以下是翻译好的代码部分:

  1. library(httr)
  2. library(jsonlite)
  3. url <- httr::parse_url("https://cameo.mfa.org/api.php")
  4. url$query <- list(
  5. action = "query",
  6. pageids = "17051",
  7. format = "json",
  8. prop = "extracts"
  9. )
  10. json <- jsonlite::fromJSON(httr::build_url(url))
  11. json$query$pages
  12. #> $`17051`
  13. #> $`17051`$pageid
  14. #> [1] 17051
  15. #>
  16. #> $`17051`$ns
  17. #> [1] 0
  18. #>
  19. #> $`17051`$title
  20. #> [1] "Copper"
  21. #>
  22. #> $`17051`$extract
  23. #> [1] "<h2><span id=\"Description\">Description</span></h2>\n<p>A reddish-brown, ductile, metallic element. Copper is present [...]"

Created on 2023-07-03 with reprex v2.0.2

  1. <details>
  2. <summary>英文:</summary>
  3. A bit different approach:
  4. ``` r
  5. library(httr)
  6. library(jsonlite)
  7. url &lt;- httr::parse_url(&quot;https://cameo.mfa.org/api.php&quot;)
  8. url$query &lt;- list(
  9. action = &quot;query&quot;,
  10. pageids = &quot;17051&quot;,
  11. format = &quot;json&quot;,
  12. prop = &quot;extracts&quot;
  13. )
  14. json &lt;- jsonlite::fromJSON(httr::build_url(url))
  15. json$query$pages
  16. #&gt; $`17051`
  17. #&gt; $`17051`$pageid
  18. #&gt; [1] 17051
  19. #&gt;
  20. #&gt; $`17051`$ns
  21. #&gt; [1] 0
  22. #&gt;
  23. #&gt; $`17051`$title
  24. #&gt; [1] &quot;Copper&quot;
  25. #&gt;
  26. #&gt; $`17051`$extract
  27. #&gt; [1] &quot;&lt;h2&gt;&lt;span id=\&quot;Description\&quot;&gt;Description&lt;/span&gt;&lt;/h2&gt;\n&lt;p&gt;A reddish-brown, ductile, metallic element. Copper is present [...]&quot;

<sup>Created on 2023-07-03 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年7月3日 22:13:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76605589.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定