在`map()`函数中出现问题,由于`UseMethod()`函数中的错误引起。

huangapple go评论109阅读模式
英文:

Problem in function map(), Caused by error in `UseMethod()`

问题

我尝试将一个文件夹中的 .html 文件映射到 RDS,但有时函数失败,如下所示

  1. html_files <- list.files(file_directory, full.names = TRUE, recursive=TRUE)
  2. rip_text <- function(court_file){
  3. ripped_text <- read_html(court_file, options = "HUGE") %>|
  4. html_text2() %>| # Pull out only the text
  5. str_remove('^.*PubDate":"\\d{4}-\\d\\d-\\d\\d",\n') %>|
  6. str_remove('\\}"; var jsonData.*$')
  7. return(ripped_text)}'
  8. ripped_files <- map(html_files, rip_text)

这里是错误消息:
错误 in map()
i 在索引: 19531。
由于在 UseMethod() 中的错误引起:
没有适用于 'xml_find_first' 的方法,应用于类别为 "xml_document" 的对象。
运行 rlang::last_trace() 查看错误发生的位置。

英文:

I tried to map a folder of .html files into RDS, but sometimes the function fails as below

  1. html_files &lt;- list.files(file_directory, full.names = TRUE, recursive=TRUE)
  2. rip_text &lt;- function(court_file){
  3. ripped_text &lt;- read_html(court_file, options = &quot;HUGE&quot;) |&gt;
  4. html_text2() |&gt; # Pull out only the text
  5. str_remove(&#39;^.*PubDate&quot;:&quot;\\d{4}-\\d\\d-\\d\\d&quot;,\n&#39;) |&gt;
  6. str_remove(&#39;\\}&quot;; var jsonData.*$&#39;)
  7. return(ripped_text)}&#39;
  8. ripped_files &lt;- map(html_files, rip_text)

Here is the error:
Error in map():
i In index: 19531.
Caused by error in UseMethod():
! no applicable method for 'xml_find_first' applied to an object of class "xml_document"
Run rlang::last_trace() to see where the error occurred.

答案1

得分: 1

如果你不太在意排除掉那几个问题的错误,并且想要大部分的数据,你可以使用 purrr::safely(),如下所示。

  1. html_files <- list.files(file_directory, full.names = TRUE, recursive=TRUE)
  2. rip_text <- function(court_file){
  3. read_html(court_file, options = "HUGE") %>%
  4. html_text2() %>%
  5. str_remove('^.*PubDate":"\\d{4}-\\d\\d-\\d\\d",\n') %>%
  6. str_remove('\\}"; var jsonData.*$')
  7. }
  8. rip_text_safe <- safely(rip_text)
  9. ripped_files <- map(html_files, rip_text_safe)

我无法测试这段代码(因为我理解你没有这些文件),但这应该适用于你。

英文:

If you don't care too much about excluding the couple of errors in question and want the bulk of the data, you could use purrr::safely(), as follows.

  1. html_files &lt;- list.files(file_directory, full.names = TRUE, recursive=TRUE)
  2. rip_text &lt;- function(court_file){
  3. read_html(court_file, options = &quot;HUGE&quot;) |&gt;
  4. html_text2() |&gt; # Pull out only the text
  5. str_remove(&#39;^.*PubDate&quot;:&quot;\\d{4}-\\d\\d-\\d\\d&quot;,\n&#39;) |&gt;
  6. str_remove(&#39;\\}&quot;; var jsonData.*$&#39;)
  7. }
  8. rip_text_safe &lt;- safely(rip_text)
  9. ripped_files &lt;- map(html_files, rip_text_safe)

I can't test this (as I understandably don't have the files) but this should work for you.

huangapple
  • 本文由 发表于 2023年7月12日 21:06:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76670914.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定