英文:
How to make tidyr::complete work when nesting the same column twice?
问题
由于某种原因,我的代码在complete
中重复了一个列名(它位于一个函数内,参数可能是一个已经存在的列)。
library(tidyverse)
# 正常工作
mtcars %>% complete(nesting(cyl, gear), vs)
# 不正常工作
mtcars %>% complete(nesting(cyl, cyl), vs)
# 错误:`dplyr::full_join()`中的错误:
# ! `y`中的连接列必须存在于数据中。
# ✖ `cyl...1` 和 `cyl...2` 存在问题。
我不希望结果中重复出现列名cyl
。
如何优雅地解决这个问题?
英文:
For some reason, my code repeats a column's name inside complete
(it's inside a function and the argumet may be a column already present).
library(tidyverse)
# Works
mtcars %>% complete(nesting(cyl, gear), vs)
# Doesn't work
mtcars %>% complete(nesting(cyl, cyl), vs)
# Error in `dplyr::full_join()`:
# ! Join columns in `y` must be present in the data.
# ✖ Problem with `cyl...1` and `cyl...2`.
I don't want the column cyl
to be repeated in the result.
How to make this work elegantly?
答案1
得分: 1
以下是您要翻译的内容:
Edit: If the idea is to remove duplicated column names in nesting
, you could use your own function:
my_nesting <- function(...){
nesting(..., .name_repair = "minimal") %>%
subset(select = !duplicated(colnames(.)))
}
mtcars %>% complete(my_nesting(cyl = cyl, cyl = cyl), vs)
#Works
Why is it not working?
Not really an answer, but this is too long to be a comment.
In your example case, I'm not sure what's the point of repeating the column twice, since the combination of two identical columns is the same as the combination of one of the two (check below).
with the default behavior
Under the hood, complete
uses full_join
and expand
. expand
works with duplicate names:
data = mtcars
out <- expand(data, nesting(cyl, cyl), vs)
But full_join
does not because names have changed:
dplyr::full_join(out, data, by = names(out))
Indeed, the default behavior of nesting
(and expand
) when there are identical names is .name_repair = "check_unique"
:
with(mtcars, nesting(cyl, cyl))
# cyl...1 cyl...2
# 1 4 4
# 2 6 6
# 3 8 8
with .name_repair = "minimal"
If you specify .name_repair = "minimal"
on top of overwriting the columns name with identical names, you'll have the same name:
with(mtcars, nesting(cyl = cyl, cyl = cyl, .name_repair = "minimal"))
# cyl cyl
# 1 4 4
# 2 6 6
# 3 8 8
But, here full_join
is blocking the process again, because names must be unique:
out <- expand(data, nesting(cyl = cyl, cyl = cyl, .name_repair = "minimal"), vs, .name_repair = "minimal")
dplyr::full_join(out, data, by = names(out))
#Error in `dplyr::full_join()`:
#! Input columns in `x` must be unique.
#✖ Problem with `cyl`.
So, what can be done?
- Not so much: there does not seem to be an apparent reason to nest identical columns since the unique combinations will be the same.
- You could create other variables in your original data.frame to match the column names created through the process.
英文:
Edit: If the idea is to remove duplicated column names in nesting
, you could use your own function:
my_nesting <- function(...){
nesting(..., .name_repair = "minimal") %>%
subset(select = !duplicated(colnames(.)))
}
mtcars %>% complete(my_nesting(cyl = cyl, cyl = cyl), vs)
#Works
Why is it not working?
<sub> Not really an answer, but this is too long to be a comment. </sub>
In your example case, I'm not sure what's the point of repeating the column twice, since the combination of two identical columns is the same as the combination of one of the two (check below).
with the default behavior
Under the hood, complete
uses full_join
and expand
. expand
works with duplicate names:
data = mtcars
out <- expand(data, nesting(cyl, cyl), vs)
But full_join
does not because names have changed:
dplyr::full_join(out, data, by = names(out))
Indeed, the default behavior of nesting
(and expand
) when there are identical names is .name_repair = "check_unique"
:
with(mtcars, nesting(cyl, cyl))
# cyl...1 cyl...2
# 1 4 4
# 2 6 6
# 3 8 8
with .name_repair = "minimal"
If you specify .name_repair = "minimal"
on top of overwriting the columns name with identical names, you'll have the same name:
with(mtcars, nesting(cyl = cyl, cyl = cyl, .name_repair = "minimal"))
# cyl cyl
# 1 4 4
# 2 6 6
# 3 8 8
But, here full_join
is blocking the process again, because names must be unique:
out <- expand(data, nesting(cyl = cyl, cyl = cyl, .name_repair = "minimal"), vs, .name_repair = "minimal")
dplyr::full_join(out, data, by = names(out))
#Error in `dplyr::full_join()`:
#! Input columns in `x` must be unique.
#✖ Problem with `cyl`.
So, what can be done?
- Not so much: there does not seem to be an apparent reason to nest identical columns since the unique combinations will be the same.
- You could create other variables in your original data.frame to match the column names created through the process.
答案2
得分: 1
You may replace nesting
with select
. select
does not allow duplicate columns so this would solve your issue.
Comparing the result of select
with nesting
to make sure they give identical results.
英文:
You may replace nesting
with select
. select
does not allow duplicate columns so this would solve your issue.
library(dplyr)
library(tidyr)
mtcars %>% complete(select(., cyl, gear), vs)
# A tibble: 38 × 11
# cyl gear vs mpg disp hp drat wt qsec am carb
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 4 3 0 NA NA NA NA NA NA NA NA
# 2 4 3 1 21.5 120. 97 3.7 2.46 20.0 0 1
# 3 4 4 0 NA NA NA NA NA NA NA NA
# 4 4 4 1 22.8 108 93 3.85 2.32 18.6 1 1
# 5 4 4 1 24.4 147. 62 3.69 3.19 20 0 2
# 6 4 4 1 22.8 141. 95 3.92 3.15 22.9 0 2
# 7 4 4 1 32.4 78.7 66 4.08 2.2 19.5 1 1
# 8 4 4 1 30.4 75.7 52 4.93 1.62 18.5 1 2
# 9 4 4 1 33.9 71.1 65 4.22 1.84 19.9 1 1
#10 4 4 1 27.3 79 66 4.08 1.94 18.9 1 1
# ℹ 28 more rows
# ℹ Use `print(n = ...)` to see more rows
Comparing the result of select
with nesting
to make sure they give identical results.
identical(mtcars %>% complete(nesting(cyl, gear), vs),
mtcars %>% complete(select(., cyl, gear), vs))
#[1] TRUE
identical(mtcars %>% complete(nesting(cyl), vs),
mtcars %>% complete(select(., cyl, cyl), vs))
#[1] TRUE
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论