英文:
`tidyr::crossing()` with a list of tibbles unexpectedly reduces rows
问题
以下是我困惑的示例。当我将一个包含1行的表格与一个包含2行的表格相交叉时,我期望得到一个包含2行的表格。这对于原子列类型有效。但是,如果2行表格是数据框的列表,我得到的不是期望的2行表格,而是一个包含1行的表格。这对我来说是没有道理的。有人能解释为什么我应该期望这样,或者是否有什么我遗漏的东西。
作为错误的交叉发布在 GitHub 上:https://github.com/tidyverse/tidyr/issues/1487
library(tibble)
#> 警告:package ‘tibble’ was built under R version 4.2.2
library(tidyr)
#> 警告:package ‘tidyr’ was built under R version 4.2.2
#;; 这是有道理的,是符合期望的:
#;; 将一个包含1行的表格与一个包含2行的表格交叉,我得到一个包含两行的表格。
(df1 <- tibble(x=1))
#> # A tibble: 1 × 1
#> x
#> <dbl>
#> 1 1
(df2 <- tibble(y=1:2))
#> # A tibble: 2 × 1
#> y
#> <int>
#> 1 1
#> 2 2
crossing(df1, df2)
#> # A tibble: 2 × 2
#> x y
#> <dbl> <int>
#> 1 1 1
#> 2 1 2
#;; 这是没有道理的。
#;; 如果第二个包含2行的表格是数据框的列表,我仍然期望得到一个包含2行的表格,但我得到的是一个包含1行的表格。
(df3 <- tibble(y=list(tibble(y=2), tibble(y=2))))
#> # A tibble: 2 × 1
#> y
#> <list>
#> 1 <tibble [1 × 1]>
#> 2 <tibble [1 × 1]>
crossing(df1, df3)
#> # A tibble: 1 × 2
#> x y
#> <dbl> <list>
#> 1 1 <tibble [1 × 1]>
sessioninfo::session_info()
#> ─ 会话信息 ─────────────────────────────────────────────────────────────
#> 设置 值
#> 版本 R version 4.2.1 Patched (2022-07-06 r82554 ucrt)
#> 操作系统 Windows 10 x64 (build 19044)
#> 系统 x86_64, mingw32
#> 用户界面 RTerm
#> 语言 (EN)
#> 区域 English_United States.utf8
#> 语言环境 English_United States.utf8
#> 时区 America/Chicago
#> 日期 2023-02-26
#> pandoc 2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ 包 ───────────────────────────────────────────────────────────────────
#> 包 * 版本 日期 (UTC) 基础R来源
#> cli 3.6.0 2023-01-09 [1] CRAN (R 4.2.2)
#> digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.2)
#> dplyr 1.1.0 2023-01-29 [1] CRAN (R 4.2.2)
#> evaluate 0.20 2023-01-17 [1] CRAN (R 4.2.2)
#> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.2.2)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.1)
#> fs 1.6.1 2023-02-06 [1] CRAN (R 4.2.2)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.1)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.1)
#> htmltools 0.5.4 2022-12-07 [1] CRAN (R 4.2.2)
#> knitr 1.42 2023-01-25 [1] CRAN (R 4.2.2)
#> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.1)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.1)
#> pillar 1.8.1 2022-08-19 [1] CRAN (R 4.2.1)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.1
<details>
<summary>英文:</summary>
Below is a reprex of my confusion. I expect when I cross a 1-row tibble with a 2-row tibble, I get a 2-row tibble. This does work with atomic column types. However, if the 2-row tibble is a list of dataframes, I don't get a 2-row table, like expected, but a 1-row table. This does not make sense to me. Can someone explain why it is what I should expect, or if there is something I'm missing.
Cross-posted as a bug on github: https://github.com/tidyverse/tidyr/issues/1487
``` r
library(tibble)
#> Warning: package 'tibble' was built under R version 4.2.2
library(tidyr)
#> Warning: package 'tidyr' was built under R version 4.2.2
#;; This makes sense, is expected:
#;; Crossing a 1-row table with a 2-row table, I get a two row table.
(df1 <- tibble(x=1))
#> # A tibble: 1 × 1
#> x
#> <dbl>
#> 1 1
(df2 <- tibble(y=1:2))
#> # A tibble: 2 × 1
#> y
#> <int>
#> 1 1
#> 2 2
crossing(df1, df2)
#> # A tibble: 2 × 2
#> x y
#> <dbl> <int>
#> 1 1 1
#> 2 1 2
#;; This does not make sense.
#;; If the second 2-row table is a list of dataframes, I still expect a 2-row
#;; table, but I get a 1-row table.
(df3 <- tibble(y=list(tibble(y=2), tibble(y=2))))
#> # A tibble: 2 × 1
#> y
#> <list>
#> 1 <tibble [1 × 1]>
#> 2 <tibble [1 × 1]>
crossing(df1, df3)
#> # A tibble: 1 × 2
#> x y
#> <dbl> <list>
#> 1 1 <tibble [1 × 1]>
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.2.1 Patched (2022-07-06 r82554 ucrt)
#> os Windows 10 x64 (build 19044)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United States.utf8
#> ctype English_United States.utf8
#> tz America/Chicago
#> date 2023-02-26
#> pandoc 2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> cli 3.6.0 2023-01-09 [1] CRAN (R 4.2.2)
#> digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.2)
#> dplyr 1.1.0 2023-01-29 [1] CRAN (R 4.2.2)
#> evaluate 0.20 2023-01-17 [1] CRAN (R 4.2.2)
#> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.2.2)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.1)
#> fs 1.6.1 2023-02-06 [1] CRAN (R 4.2.2)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.1)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.1)
#> htmltools 0.5.4 2022-12-07 [1] CRAN (R 4.2.2)
#> knitr 1.42 2023-01-25 [1] CRAN (R 4.2.2)
#> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.1)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.1)
#> pillar 1.8.1 2022-08-19 [1] CRAN (R 4.2.1)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.1)
#> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.2.2)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.2)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.2)
#> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.2)
#> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.2.2)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.1)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.2)
#> rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.1)
#> rmarkdown 2.20 2023-01-19 [1] CRAN (R 4.2.2)
#> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.2)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.1)
#> styler 1.9.0 2023-01-15 [1] CRAN (R 4.2.2)
#> tibble * 3.1.8 2022-07-22 [1] CRAN (R 4.2.2)
#> tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.2.2)
#> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.1)
#> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.2.2)
#> vctrs 0.5.2 2023-01-23 [1] CRAN (R 4.2.2)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.1)
#> xfun 0.37 2023-01-31 [1] CRAN (R 4.2.2)
#> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.2.2)
#>
#> [1] C:/Users/irinzn/R/R-4.2.1patched/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
<sup>Created on 2023-02-26 with reprex v2.0.2</sup>
答案1
得分: 1
查看 ?crossing
> ‘crossing()是对
expand_grid()` 的封装,用于去重和排序其输入
请使用 expand_grid
代替
expand_grid(df1, df3)
# 一个数据表:2 行 × 2 列
x y
<dbl> <list>
1 1 <tibble [1 × 1]>
2 1 <tibble [1 × 1]>
英文:
See ?crossing
>‘crossing()’ is a wrapper around ‘expand_grid()’ that
de-duplicates and sorts its inputs
Use expand_grid
instead
expand_grid(df1, df3)
# A tibble: 2 × 2
x y
<dbl> <list>
1 1 <tibble [1 × 1]>
2 1 <tibble [1 × 1]>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论