R查询维基媒体服务器

huangapple go评论83阅读模式
英文:

R Query of a Wikimedia server

问题

我正在尝试查询Cameo数据库。

如果我使用以下URL https://cameo.mfa.org/api.php?action=query&pageids=17051&prop=extracts&format=json,那么我在线上得到一个有效的输出。

然而,如果我使用:

library(httr)
library(jsonlite)

base_url <- "https://cameo.mfa.org/api.php"

query_param <- list(action  = "query",
                    pageids = "17051",
                    format = "json",
                    prop = "extracts"
)

parsed_content <- httr::GET(base_url, query_param)

jsonlite::fromJSON(content(parsed_content, as = "text", encoding = "UTF-8"))

那么 jsonlite 会失败,因为输出是HTML格式而不是JSON。

你对此有什么建议吗?

英文:

I am trying to query the Cameo database.

If I use the URL https://cameo.mfa.org/api.php?action=query&pageids=17051&prop=extracts&format=json, then I get, online, a valid output.

However, if I use:

library(httr)
library(jsonlite)

base_url &lt;- &quot;https://cameo.mfa.org/api.php&quot;

query_param &lt;- list(action  = &quot;query&quot;,
                    pageids = &quot;17051&quot;,
                    format = &quot;json&quot;,
                    prop = &quot;extracts&quot;
)

parsed_content &lt;- httr::GET(base_url, query_param)

jsonlite::fromJSON(content(parsed_content, as = &quot;text&quot;, encoding = &quot;UTF-8&quot;))

Then jsonlite fails because the output is in html format and not json.

Do you have any advice on this?

答案1

得分: 2

httr::GET的第二个参数是config=,这不是你应该分配query_param的位置。而应该将其命名为query=query_param

res &lt;- httr::GET(base_url, query = query_param)
res
# 响应 [https://cameo.mfa.org/api.php?action=query&amp;pageids=17051&amp;format=json&amp;prop=extracts]
#   日期:2023年07月03日 15:06
#   状态:200
#   内容类型:application/json; charset=utf-8
#   大小:5.22千字节
str(httr::content(res))
# 3个元素的列表
#  $ batchcomplete: 字符串 ""
#  $ warnings     : 1个列表
#   ..$ extracts: 1个列表
#   .. ..$ *: 字符串 "HTML可能存在格式错误和/或不平衡,可能省略内联图像。使用需谨慎。已知问题包括li"| 被截断
#  $ query        : 1个列表
#   ..$ pages: 1个列表
#   .. ..$ 17051: 4个元素的列表
#   .. .. ..$ pageid : 整数 17051
#   .. .. ..$ ns     : 整数 0
#   .. .. ..$ title  : 字符串 "Copper"
#   .. .. ..$ extract: 字符串 "<h2><span id=\"Description\">Description</span></h2>\n<p>A reddish-brown, ductile, metallic element. Copper is "| 被截断
英文:

The second argument to httr::GET is config=, which is not where you should be assigning query_param. Instead name it as query=query_param.

res &lt;- httr::GET(base_url, query = query_param)
res
# Response [https://cameo.mfa.org/api.php?action=query&amp;pageids=17051&amp;format=json&amp;prop=extracts]
#   Date: 2023-07-03 15:06
#   Status: 200
#   Content-Type: application/json; charset=utf-8
#   Size: 5.22 kB
str(httr::content(res))
# List of 3
#  $ batchcomplete: chr &quot;&quot;
#  $ warnings     :List of 1
#   ..$ extracts:List of 1
#   .. ..$ *: chr &quot;HTML may be malformed and/or unbalanced and may omit inline images. Use at your own risk. Known problems are li&quot;| __truncated__
#  $ query        :List of 1
#   ..$ pages:List of 1
#   .. ..$ 17051:List of 4
#   .. .. ..$ pageid : int 17051
#   .. .. ..$ ns     : int 0
#   .. .. ..$ title  : chr &quot;Copper&quot;
#   .. .. ..$ extract: chr &quot;&lt;h2&gt;&lt;span id=\&quot;Description\&quot;&gt;Description&lt;/span&gt;&lt;/h2&gt;\n&lt;p&gt;A reddish-brown, ductile, metallic element. Copper is &quot;| __truncated__

答案2

得分: 1

以下是翻译好的代码部分:

library(httr)
library(jsonlite)

url <- httr::parse_url("https://cameo.mfa.org/api.php")
url$query <- list(
  action = "query",
  pageids = "17051",
  format = "json",
  prop = "extracts"
)

json <- jsonlite::fromJSON(httr::build_url(url))


json$query$pages
#> $`17051`
#> $`17051`$pageid
#> [1] 17051
#> 
#> $`17051`$ns
#> [1] 0
#> 
#> $`17051`$title
#> [1] "Copper"
#> 
#> $`17051`$extract
#> [1] "<h2><span id=\"Description\">Description</span></h2>\n<p>A reddish-brown, ductile, metallic element. Copper is present [...]"

Created on 2023-07-03 with reprex v2.0.2


<details>
<summary>英文:</summary>

A bit different approach:

``` r
library(httr)
library(jsonlite)

url &lt;- httr::parse_url(&quot;https://cameo.mfa.org/api.php&quot;)
url$query &lt;- list(
  action = &quot;query&quot;,
  pageids = &quot;17051&quot;,
  format = &quot;json&quot;,
  prop = &quot;extracts&quot;
)

json &lt;- jsonlite::fromJSON(httr::build_url(url))


json$query$pages
#&gt; $`17051`
#&gt; $`17051`$pageid
#&gt; [1] 17051
#&gt; 
#&gt; $`17051`$ns
#&gt; [1] 0
#&gt; 
#&gt; $`17051`$title
#&gt; [1] &quot;Copper&quot;
#&gt; 
#&gt; $`17051`$extract
#&gt; [1] &quot;&lt;h2&gt;&lt;span id=\&quot;Description\&quot;&gt;Description&lt;/span&gt;&lt;/h2&gt;\n&lt;p&gt;A reddish-brown, ductile, metallic element. Copper is present [...]&quot;

<sup>Created on 2023-07-03 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年7月3日 22:13:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76605589.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定