如何使用R从URL下载具有合并单元格的.xls数据。

huangapple go评论69阅读模式
英文:

How to download from the URL a .xls data with merged cells using R

问题

我一直在尝试直接从其URL下载一个.xls数据文件。我尝试了这个链接(https://stackoverflow.com/questions/46552923/downloading-excel-file-using-r)和readHTMLTable函数,但显然它对.xls文件不起作用,只对HTML文档中的表格起作用。

它似乎可以使用以下方法:

my_url <- 'https://ftp.ibge.gov.br/Censos/Censo_Demografico_2022/Previa_da_Populacao/POP2022_Brasil_e_UFs.xls'

download.file(url=my_url, fillMergedCells = TRUE, destfile='./df.xlsx')

但当我尝试用以下代码打开它时:

df <- read_xlsx('./df.xlsx')

我收到以下错误:

Error in utils::unzip(zip_path, list = TRUE) :
无法打开zip文件'./df.xlsx'

我的数据不是压缩的。

另外,我可以在R之外的计算机上打开该文件,但它以另一种编码显示,有很多“??????”。我通过以下方式解决了这个问题:

download.file(url=my_url, fillMergedCells = TRUE, destfile='./df.xlsx', mode = "wb")

但现在当我尝试读取数据框时,它返回错误:

Error: 找不到``

我知道有一种从URL直接下载并正确打开它的方法,但我就是无法让它起作用。

不确定是否是相关信息,但文件的第一行由合并的单元格组成。

英文:

I've been trying to download a .xls data directly from its URL. I tried this and the readHTMLTable function, but it apparently doesn't work for .xls, only for tables HTML documents.

It kind of worked with:

my_url &lt;- &#39;https://ftp.ibge.gov.br/Censos/Censo_Demografico_2022/Previa_da_Populacao/POP2022_Brasil_e_UFs.xls&#39;

download.file(url=my_url, fillMergedCells = TRUE, destfile=&#39;./df.xlsx&#39;)

But when I try to open it with:

df &lt;- read_xlsx(&#39;./df.xlsx&#39;)`

I get the following error:

> Error in utils::unzip(zip_path, list = TRUE) :
zip file ';/df.xlsx' cannot be opened"

My data is not zipped.

Also, I could open the archive on my computer out of R, but it was all messed in another encoding with lots of "??????". I solved this problem with:

download.file(url=my_url, fillMergedCells = TRUE, destfile=&#39;./df.xlsx&#39;, mode = &quot;wb&quot;)

But now when I try to read the dataframe it returns the error

> Error: `` not found

I know that there's a way of downloading it directly from the URL and open it properly, but I just can't make it work.

Not sure if that's a relevant information, but the first line of the archive is made from merged cells.

答案1

得分: 2

以下是翻译好的部分:

"引用的 URL 中的源文件类型为“.xls”,但您将其存储并尝试处理为“.xlsx”。xls 和 xlsx 文件格式不相同。您应该"

my_url <- 'https://ftp.ibge.gov.br/Censos/Censo_Demografico_2022/Previa_da_Populacao/POP2022_Brasil_e_UFs.xls' 
download.file(url = my_url, fillMergedCells = TRUE, destfile = './df.xls') 
df <- readxl::read_xls('./df.xls')
英文:

The source file referenced in the url is of type ".xls" but you store it and attempt to process it as an ".xlsx". The xls and xlsx file formats are not the same . You should

my_url &lt;- &#39;https://ftp.ibge.gov.br/Censos/Censo_Demografico_2022/Previa_da_Populacao/POP2022_Brasil_e_UFs.xls&#39; 
download.file(url = my_url, fillMergedCells = TRUE, destfile = &#39;./df.xls&#39;) 
df &lt;- readxl::read_xls(&#39;./df.xls&#39;)

huangapple
  • 本文由 发表于 2023年5月29日 21:30:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76357813.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定