Is there an alternative to tidyr::unnest_wider? It fails when nested list element is not a vector

huangapple go评论85阅读模式
英文:

Is there an alternative to tidyr::unnest_wider? It fails when nested list element is not a vector

问题

我有旧的代码,以前可以使用 tidyr::unnest_wider() 来将嵌套的命名列表展开成它们自己的列;然而,现在不再起作用。相反,我收到一个错误,错误消息是 x$name_of_list 必须是一个向量,而不是一个 <non-vector> 对象,其中我的非向量对象包括 <mcpfit><patchwork/gg/ggplot> 对象。看起来他们尝试在这里解决这个问题1,但在使用 tidyr v. 1.3.0 时仍然不起作用。

我无法轻松地从自己的用例中创建一个可重现的示例。但我将使用上面链接的 Github 问题中列出的示例,希望这也适用于我的用例。

library(tidyverse)  
  
m <- 
  tibble::as_tibble(mtcars[1,]) %>%
  mutate(ls_col=list(
    list(
      a=c(1:10), 
      b=lm(cyl~gear))
    )
  )  

m2 <-
  m %>%
  unnest_wider(ls_col)

我要么寻找一个替代的 data.table 或基本的 R 解决方案,要么寻找一个 tidyverse 的解决方法(例如,从嵌套列表中删除非向量对象,然后使用 tidyr::unnest_wider())。tidyr::unnest() 似乎可以工作,但我不知道如何将包含列表的列转换成它们自己的列(每次尝试类似这样的操作时 R 都会崩溃)。

英文:

I have old code that used to work using tidyr::unnest_wider() to unnest a nested named list into their own columns; however, it no longer works. Instead I get an error saying x$name_of_list must be a vector, not a &lt;non-vector&gt; object, where my non-vector objects include &lt;mcpfit&gt; and &lt;patchwork/gg/ggplot&gt; objects. It seems like they tried to address this issue here, but it still doesn't work using tidyr v. 1.3.0.

I couldn't easily create a reproducible example from my own use case. But I'll use the example listed in the Github issue link above in hopes that this will work for my use case as well.

library(tidyverse)  
  
m &lt;- 
  tibble::as_tibble(mtcars[1,]) %&gt;% 
  mutate(ls_col=list(
    list(
      a=c(1:10), 
      b=lm(cyl~gear))
    )
  )  

m2 &lt;-
  m %&gt;% 
  unnest_wider(ls_col)

I am looking for EITHER an alternative data.table or base R solution OR a tidyverse workaround (e.g., remove the non-vector objects from the nested list and then use tidyr::unnest_wider()). tidyr::unnest() seems to work, but then I don't know how to pivot the column containing the lists into their own columns (R crashes every time I try something like this).

答案1

得分: 4

你可以指定 strict = TRUE

library(tidyverse)  

m <- tibble::as_tibble(mtcars[1,]) %>%
  mutate(ls_col= list(
    list(
      a=c(1:10), 
      b=lm(cyl~gear))
  ))

m %>%
  unnest_wider(ls_col, strict = TRUE)
#> # A tibble: 1 x 13
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb a      b    
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>  <lm>  
#> 1    21     6   160   110   3.9  2.62  16.5     0     1     4     4 <int>  <lm>

为什么?

strict 参数默认为 FALSE,在这种状态下,unnest_wider 将会将您列表中的零长度的类型化对象,如 numeric()character(),转换为 NA,这可以帮助将具有零长度项目的列表转换为类型化列,例如:

m <- tibble(ls_col = list(list(a = character()), list(a = 1))) 

m %>% unnest_wider(ls_col, strict = FALSE)
#> # A tibble: 2 x 1
#>       a
#>   <dbl>
#> 1    NA
#> 2     1

而对于 strict = TRUE,类型会严格保留,这意味着在这种情况下我们最终会得到一个列表列:

m %>% unnest_wider(ls_col, strict = TRUE)
#> # A tibble: 2 x 1
#>   a        
#>   <list>   
#> 1 <chr [0]>
#> 2 <dbl [1]>

默认的 strict = FALSE 在某些情况下会很有用,因为它可以帮助重新排列包含一些空项目的复杂列表(例如解析某些 JSON 结构)。为了实现这一点,unnest_wider 使用了函数 vctrs::list_sizes(通过非导出函数 elt_to_wide),如果列表包含非向量项,它将引发错误:

vctrs:::list_sizes(list(a = 1, b = lm(cyl~gear, mtcars)))
#> Error in `vctrs:::list_sizes()`:
#> ! `x$b` must be a vector, not a <lm> object.
#> Run `rlang::last_trace()` to see where the error occurred.

我不会称这种行为为_错误_,但它有点不直观,感觉我们使用 strict = TRUE 的原因与其设计理念不符。但是,在这里它确实起作用。

在2023-08-04使用 reprex v2.0.2 创建

英文:

You can specify strict = TRUE.

library(tidyverse)  

m &lt;- tibble::as_tibble(mtcars[1,]) %&gt;% 
  mutate(ls_col= list(
    list(
      a=c(1:10), 
      b=lm(cyl~gear))
  ))

m %&gt;% 
  unnest_wider(ls_col, strict = TRUE)
#&gt; # A tibble: 1 x 13
#&gt;     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb a      b    
#&gt;   &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;list&gt; &lt;lis&gt;
#&gt; 1    21     6   160   110   3.9  2.62  16.5     0     1     4     4 &lt;int&gt;  &lt;lm&gt;

Why?

The strict argument defaults to FALSE, and in this state, unnest_wider will convert zero-length typed objects like numeric() or character() in your list to NA, which can be helpful in converting lists with zero-length items into a typed column, for example:

m &lt;- tibble(ls_col = list(list(a = character()), list(a = 1))) 

m %&gt;% unnest_wider(ls_col, strict = FALSE)
#&gt; # A tibble: 2 x 1
#&gt;       a
#&gt;   &lt;dbl&gt;
#&gt; 1    NA
#&gt; 2     1

Whereas with strict = TRUE, type is strictly preserved, which means in this case we end up with a list column:

m %&gt;% unnest_wider(ls_col, strict = TRUE)
#&gt; # A tibble: 2 x 1
#&gt;   a        
#&gt;   &lt;list&gt;   
#&gt; 1 &lt;chr [0]&gt;
#&gt; 2 &lt;dbl [1]&gt;

The default strict = FALSE can come in handy in some circumstances, since it can help rearranging complex lists with some empty items (as in parsing certain json structures). To achieve this, unnest_wider uses the function vctrs::list_sizes, (via the non-exported function elt_to_wide), which will throw an error if the list contains non-vector items:

vctrs:::list_sizes(list(a = 1, b = lm(cyl~gear, mtcars)))
#&gt; Error in `vctrs:::list_sizes()`:
#&gt; ! `x$b` must be a vector, not a &lt;lm&gt; object.
#&gt; Run `rlang::last_trace()` to see where the error occurred.

I wouldn't call this behaviour a bug as such, but it's a bit unintuitive and feels like we are using strict = TRUE for a reason other than its design rationale. However, it does work here.

<sup>Created on 2023-08-04 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年8月5日 03:07:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76838597.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定