英文:
Problem in function map(), Caused by error in `UseMethod()`
问题
我尝试将一个文件夹中的 .html 文件映射到 RDS,但有时函数失败,如下所示
html_files <- list.files(file_directory, full.names = TRUE, recursive=TRUE)
rip_text <- function(court_file){
ripped_text <- read_html(court_file, options = "HUGE") %>|
html_text2() %>| # Pull out only the text
str_remove('^.*PubDate":"\\d{4}-\\d\\d-\\d\\d",\n') %>|
str_remove('\\}"; var jsonData.*$')
return(ripped_text)}'
ripped_files <- map(html_files, rip_text)
这里是错误消息:
错误 in map()
:
i 在索引: 19531。
由于在 UseMethod()
中的错误引起:
没有适用于 'xml_find_first' 的方法,应用于类别为 "xml_document" 的对象。
运行 rlang::last_trace()
查看错误发生的位置。
英文:
I tried to map a folder of .html files into RDS, but sometimes the function fails as below
html_files <- list.files(file_directory, full.names = TRUE, recursive=TRUE)
rip_text <- function(court_file){
ripped_text <- read_html(court_file, options = "HUGE") |>
html_text2() |> # Pull out only the text
str_remove('^.*PubDate":"\\d{4}-\\d\\d-\\d\\d",\n') |>
str_remove('\\}"; var jsonData.*$')
return(ripped_text)}'
ripped_files <- map(html_files, rip_text)
Here is the error:
Error in map()
:
i In index: 19531.
Caused by error in UseMethod()
:
! no applicable method for 'xml_find_first' applied to an object of class "xml_document"
Run rlang::last_trace()
to see where the error occurred.
答案1
得分: 1
如果你不太在意排除掉那几个问题的错误,并且想要大部分的数据,你可以使用 purrr::safely()
,如下所示。
html_files <- list.files(file_directory, full.names = TRUE, recursive=TRUE)
rip_text <- function(court_file){
read_html(court_file, options = "HUGE") %>%
html_text2() %>%
str_remove('^.*PubDate":"\\d{4}-\\d\\d-\\d\\d",\n') %>%
str_remove('\\}"; var jsonData.*$')
}
rip_text_safe <- safely(rip_text)
ripped_files <- map(html_files, rip_text_safe)
我无法测试这段代码(因为我理解你没有这些文件),但这应该适用于你。
英文:
If you don't care too much about excluding the couple of errors in question and want the bulk of the data, you could use purrr::safely()
, as follows.
html_files <- list.files(file_directory, full.names = TRUE, recursive=TRUE)
rip_text <- function(court_file){
read_html(court_file, options = "HUGE") |>
html_text2() |> # Pull out only the text
str_remove('^.*PubDate":"\\d{4}-\\d\\d-\\d\\d",\n') |>
str_remove('\\}"; var jsonData.*$')
}
rip_text_safe <- safely(rip_text)
ripped_files <- map(html_files, rip_text_safe)
I can't test this (as I understandably don't have the files) but this should work for you.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论