使用R查询包中每个唯一资源的API

huangapple go评论144阅读模式
英文:

Querying API for every unique resource in a package using R

问题

我正在编写一个脚本,用于从一个开放数据网站使用CKAN下载包中的所有唯一的Excel文件。我目前正在尝试编写一个函数,该函数将循环遍历唯一数据集ID的列表,获取每个ID的URL并将数据集下载到我的计算机。但是,我在编写这个函数时遇到了问题。

到目前为止,该函数只给我返回了包中的第一个数据集,但还有3个需要下载的数据集。

有人知道我哪里出错了吗?

英文:

I'm writing a script to download all unique excel files in a package from an open data site using CKAN. I'm currently trying to write a function that cycles through a list of the unique dataset IDs, gets the URL for each ID and downloads the dataset to my computer. I'm however having trouble writing the function.

So far the function only gives me the first dataset in the package, but there are 3 more that need to be downloaded.

library(tidyverse)
library(ckanr)
library(jsonlite)
library(readxl)
library(curl)
library(janitor)
library(mlr3misc)


url <- "http://osmdatacatalog.alberta.ca/" # set url to access data
ckanr_setup(url = url)

x <- resource_search(q = "name:wetland monitoring benthic invertebrate community", limit = 10) # get id of data
id <- ids(x$results)

id_download <- function(id) {
  for (i in id)
    a <- resource_show(i)
    b <- a$url
    destfile <- paste("C:/Users/Name/Documents/Database_updates/OSM_benthic_invertebrates/",basename(b))
    curl::curl_download(b, destfile)
}

Anyone know where I'm getting this wrong?

答案1

得分: 0

for循环需要在其后加上大括号。大括号内的内容是在循环中执行的。

看起来所有的文件可能都具有相同的名称?如果是这样,它们可能会互相覆盖。以防万一,可能有意义的是向destfile名称添加一些内容,以确保所有文件名都是唯一的。这对我有用:

dir.create("invertebrates")

url <- "http://osmdatacatalog.alberta.ca/" # 设置访问数据的URL
ckanr_setup(url = url)

x <- resource_search(q = "name:wetland monitoring benthic invertebrate community", limit = 10) # 获取数据的ID
id <- ids(x$results)

id_download <- function(id) {

  for (i in id){
    a <- resource_show(i)
    b <- a$url
  
    destfile <- paste0("./invertebrates/",
                       substr(i, 1,4),
                       basename(b))
  
    curl::curl_download(b, destfile)
  
  }
}

id_download(id)
英文:

The for loop needs to have brackets following it. The stuff inside the brackets is what gets executed in a loop.

It also looks like all the files might have the same name? If they do they might overwrite each other. Just in case it might make sense to add something to the destfile name so that you're sure all the file names will be unique. This worked for me:

dir.create(&quot;invertebrates&quot;)


url &lt;- &quot;http://osmdatacatalog.alberta.ca/&quot; # set url to access data
ckanr_setup(url = url)

x &lt;- resource_search(q = &quot;name:wetland monitoring benthic invertebrate community&quot;, limit = 10) # get id of data
id &lt;- ids(x$results)

id_download &lt;- function(id) {

  for (i in id){
    a &lt;- resource_show(i)
    b &lt;- a$url
  
  destfile &lt;- paste0(&quot;./invertebrates/&quot;,
                     substr(i, 1,4),
                     basename(b))
  
  curl::curl_download(b, destfile)
  
  }
}


id_download(id)

huangapple
  • 本文由 发表于 2023年3月7日 06:13:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/75656332.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定