Unnesting/rectangling/flattening a nested list using `tidyr::unnest_longer()`

huangapple go评论61阅读模式
英文:

Unnesting/rectangling/flattening a nested list using `tidyr::unnest_longer()`

问题

I've been trying to get my head around the unnesting functions in tidyr and tibblify. I believe you should be able to use unnest_longer() to replicate the more manual methods below of turning this kind of nested list into a tibble, but I've been struggling with the docs a little. A correct example of how to do this would help me immensely:

# Example nested list
nl <- list(time = list("2023-02-06", "2023-02-07", "2023-02-08",
                       "2023-02-09", "2023-02-10", "2023-02-11",
                       "2023-02-12"), 
           precipitation_sum = list(0.9, 0, 0, 0.3, 0, 0, 0))

# one way to do it (extract colnames and construct)
tibble(!!! setNames(map(nl, unlist),names(nl)))

# another way (collect & reduce each sublist)
as_tibble(lapply(nl, function(x) Reduce(c, x)))

# how to use tidyr and unnest_longer? (below is incorrect)
unnest_longer(tibble(nl), col = everything())
英文:

I've been trying to get my head around the unnesting functions in tidyr and tibblify. I believe you should be able to use unnest_longer() to replicate the more manual methods below of turning this kind of nested list into a tibble, but I've been struggling with the docs a little. A correct example of how to do this would help me immensely:

# Example nested list
nl <- list(time = list("2023-02-06", "2023-02-07", "2023-02-08",
                       "2023-02-09", "2023-02-10", "2023-02-11",
                       "2023-02-12"), 
           precipitation_sum = list(0.9, 0, 0, 0.3, 0, 0, 0))

# one way to do it (extract colnames and construct)
tibble(!!! setNames(map(nl, unlist),names(nl)))

# another way (collect & reduce each sublist)
as_tibble(lapply(nl, function(x) Reduce(c, x)))

# how to use tidyr and unnest_longer? (below is incorrect)
unnest_longer(tibble(nl), col = everything())

答案1

得分: 4

以下是翻译后的代码部分:

library(tibble)
library(tidyr)
as_tibble(nl) %>%
    unnest(cols = where(is.list))

-output

# A tibble: 7 × 2
  time       precipitation_sum
  <chr>                  <dbl>
1 2023-02-06               0.9
2 2023-02-07               0  
3 2023-02-08               0  
4 2023-02-09               0.3
5 2023-02-10               0  
6 2023-02-11               0  
7 2023-02-12               0  

或者更紧凑的写法:

library(purrr)
map_dfc(nl, unlist)
# A tibble: 7 × 2
  time       precipitation_sum
  <chr>                  <dbl>
1 2023-02-06               0.9
2 2023-02-07               0  
3 2023-02-08               0  
4 2023-02-09               0.3
5 2023-02-10               0  
6 2023-02-11               0  
7 2023-02-12               0  

请注意,上述代码中的R语言代码保持不变,只有注释部分进行了翻译。

英文:

We could use

library(tibble)
library(tidyr)
as_tibble(nl) %&gt;% 
    unnest(cols = where(is.list))

-output

# A tibble: 7 &#215; 2
  time       precipitation_sum
  &lt;chr&gt;                  &lt;dbl&gt;
1 2023-02-06               0.9
2 2023-02-07               0  
3 2023-02-08               0  
4 2023-02-09               0.3
5 2023-02-10               0  
6 2023-02-11               0  
7 2023-02-12               0  

Or more compactly

library(purrr)
map_dfc(nl, unlist)
# A tibble: 7 &#215; 2
  time       precipitation_sum
  &lt;chr&gt;                  &lt;dbl&gt;
1 2023-02-06               0.9
2 2023-02-07               0  
3 2023-02-08               0  
4 2023-02-09               0.3
5 2023-02-10               0  
6 2023-02-11               0  
7 2023-02-12               0  

答案2

得分: 1

另一个有趣的选项是使用 dmap(以及 dmap 背后的历史):

'purrrlyr 包含一些位于 purrrdplyr 交集处的函数。它们已从 purrr 中移除,以使包更轻量,并且因为它们已被 tidyverse 中的其他解决方案替代。' <https://github.com/hadley/purrrlyr/>

#install.packages("purrrlyr")
library(purrrlyr)
nl %>%
  dmap(unlist)
  time       precipitation_sum
  <chr>                  <dbl>
1 2023-02-06               0.9
2 2023-02-07               0  
3 2023-02-08               0  
4 2023-02-09               0.3
5 2023-02-10               0  
6 2023-02-11               0  
7 2023-02-12               0 
英文:

Another intersting option is to use dmap (and the history behind dmap):

'purrrlyr contains some functions that lie at the intersection of purrr and dplyr. They have been removed from purrr in order to make the package lighter and because they have been replaced by other solutions in the tidyverse.' <https://github.com/hadley/purrrlyr/>

#install.packages(&quot;purrrlyr&quot;)
library(purrrlyr)
nl %&gt;% 
  dmap(unlist)
  time       precipitation_sum
  &lt;chr&gt;                  &lt;dbl&gt;
1 2023-02-06               0.9
2 2023-02-07               0  
3 2023-02-08               0  
4 2023-02-09               0.3
5 2023-02-10               0  
6 2023-02-11               0  
7 2023-02-12               0 

huangapple
  • 本文由 发表于 2023年2月6日 12:46:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/75357416.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定