如何在同一列嵌套两次时使tidyr::complete起作用?

huangapple go评论73阅读模式
英文:

How to make tidyr::complete work when nesting the same column twice?

问题

由于某种原因,我的代码在complete中重复了一个列名(它位于一个函数内,参数可能是一个已经存在的列)。

library(tidyverse)
# 正常工作
mtcars %>% complete(nesting(cyl, gear), vs)

# 不正常工作
mtcars %>% complete(nesting(cyl, cyl), vs)

# 错误:`dplyr::full_join()`中的错误:
# ! `y`中的连接列必须存在于数据中。
# ✖ `cyl...1` 和 `cyl...2` 存在问题。

我不希望结果中重复出现列名cyl

如何优雅地解决这个问题?

英文:

For some reason, my code repeats a column's name inside complete (it's inside a function and the argumet may be a column already present).

library(tidyverse)
# Works
mtcars %>% complete(nesting(cyl, gear), vs)

# Doesn't work
mtcars %>% complete(nesting(cyl, cyl), vs)

# Error in `dplyr::full_join()`:
# ! Join columns in `y` must be present in the data.
# ✖ Problem with `cyl...1` and `cyl...2`.

I don't want the column cyl to be repeated in the result.

How to make this work elegantly?

答案1

得分: 1

以下是您要翻译的内容:

Edit: If the idea is to remove duplicated column names in nesting, you could use your own function:

my_nesting <- function(...){
  nesting(..., .name_repair = "minimal") %>%
    subset(select = !duplicated(colnames(.)))
}

mtcars %>% complete(my_nesting(cyl = cyl, cyl = cyl), vs) 
#Works

Why is it not working?

Not really an answer, but this is too long to be a comment.

In your example case, I'm not sure what's the point of repeating the column twice, since the combination of two identical columns is the same as the combination of one of the two (check below).

with the default behavior

Under the hood, complete uses full_join and expand. expand works with duplicate names:

data = mtcars
out <- expand(data, nesting(cyl, cyl), vs)

But full_join does not because names have changed:

dplyr::full_join(out, data, by = names(out))

Indeed, the default behavior of nesting (and expand) when there are identical names is .name_repair = "check_unique":

with(mtcars, nesting(cyl, cyl)) 

#   cyl...1 cyl...2
# 1       4       4
# 2       6       6
# 3       8       8

with .name_repair = "minimal"

If you specify .name_repair = "minimal" on top of overwriting the columns name with identical names, you'll have the same name:

with(mtcars, nesting(cyl = cyl, cyl = cyl, .name_repair = "minimal"))

#     cyl   cyl
# 1     4     4
# 2     6     6
# 3     8     8

But, here full_join is blocking the process again, because names must be unique:

out <- expand(data, nesting(cyl = cyl, cyl = cyl, .name_repair = "minimal"), vs, .name_repair = "minimal")
dplyr::full_join(out, data, by = names(out))

#Error in `dplyr::full_join()`:
#! Input columns in `x` must be unique.
#✖ Problem with `cyl`.

So, what can be done?

  • Not so much: there does not seem to be an apparent reason to nest identical columns since the unique combinations will be the same.
  • You could create other variables in your original data.frame to match the column names created through the process.
英文:

Edit: If the idea is to remove duplicated column names in nesting, you could use your own function:

my_nesting &lt;- function(...){
  nesting(..., .name_repair = &quot;minimal&quot;) %&gt;%
    subset(select = !duplicated(colnames(.)))
}

mtcars %&gt;% complete(my_nesting(cyl = cyl, cyl = cyl), vs) 
#Works

Why is it not working?

<sub> Not really an answer, but this is too long to be a comment. </sub>

In your example case, I'm not sure what's the point of repeating the column twice, since the combination of two identical columns is the same as the combination of one of the two (check below).

with the default behavior

Under the hood, complete uses full_join and expand. expand works with duplicate names:

data = mtcars
out &lt;- expand(data, nesting(cyl, cyl), vs)

But full_join does not because names have changed:

dplyr::full_join(out, data, by = names(out))

Indeed, the default behavior of nesting (and expand) when there are identical names is .name_repair = &quot;check_unique&quot;:

with(mtcars, nesting(cyl, cyl)) 

#   cyl...1 cyl...2
# 1       4       4
# 2       6       6
# 3       8       8

with .name_repair = &quot;minimal&quot;

If you specify .name_repair = &quot;minimal&quot; on top of overwriting the columns name with identical names, you'll have the same name:

with(mtcars, nesting(cyl = cyl, cyl = cyl, .name_repair = &quot;minimal&quot;))

#     cyl   cyl
# 1     4     4
# 2     6     6
# 3     8     8

But, here full_join is blocking the process again, because names must be unique:

out &lt;- expand(data, nesting(cyl = cyl, cyl = cyl, .name_repair = &quot;minimal&quot;), vs, .name_repair = &quot;minimal&quot;)
dplyr::full_join(out, data, by = names(out))

#Error in `dplyr::full_join()`:
#! Input columns in `x` must be unique.
#✖ Problem with `cyl`.

So, what can be done?

  • Not so much: there does not seem to be an apparent reason to nest identical columns since the unique combinations will be the same.
  • You could create other variables in your original data.frame to match the column names created through the process.

答案2

得分: 1

You may replace nesting with select. select does not allow duplicate columns so this would solve your issue.

Comparing the result of select with nesting to make sure they give identical results.

英文:

You may replace nesting with select. select does not allow duplicate columns so this would solve your issue.

library(dplyr)
library(tidyr)

mtcars %&gt;% complete(select(., cyl, gear), vs)

# A tibble: 38 &#215; 11
#     cyl  gear    vs   mpg  disp    hp  drat    wt  qsec    am  carb
#   &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
# 1     4     3     0  NA    NA      NA NA    NA     NA      NA    NA
# 2     4     3     1  21.5 120.     97  3.7   2.46  20.0     0     1
# 3     4     4     0  NA    NA      NA NA    NA     NA      NA    NA
# 4     4     4     1  22.8 108      93  3.85  2.32  18.6     1     1
# 5     4     4     1  24.4 147.     62  3.69  3.19  20       0     2
# 6     4     4     1  22.8 141.     95  3.92  3.15  22.9     0     2
# 7     4     4     1  32.4  78.7    66  4.08  2.2   19.5     1     1
# 8     4     4     1  30.4  75.7    52  4.93  1.62  18.5     1     2
# 9     4     4     1  33.9  71.1    65  4.22  1.84  19.9     1     1
#10     4     4     1  27.3  79      66  4.08  1.94  18.9     1     1
# ℹ 28 more rows
# ℹ Use `print(n = ...)` to see more rows

Comparing the result of select with nesting to make sure they give identical results.

identical(mtcars %&gt;% complete(nesting(cyl, gear), vs), 
          mtcars %&gt;% complete(select(., cyl, gear), vs))
#[1] TRUE

identical(mtcars %&gt;% complete(nesting(cyl), vs),
          mtcars %&gt;% complete(select(., cyl, cyl), vs))
#[1] TRUE

huangapple
  • 本文由 发表于 2023年7月13日 15:55:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76677124.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定