2023年7月13日 15:55:37go评论103阅读模式

英文:

How to make tidyr::complete work when nesting the same column twice?

问题

由于某种原因，我的代码在complete中重复了一个列名（它位于一个函数内，参数可能是一个已经存在的列）。

library(tidyverse)
# 正常工作
mtcars %>% complete(nesting(cyl, gear), vs)
# 不正常工作
mtcars %>% complete(nesting(cyl, cyl), vs)
# 错误：`dplyr::full_join()`中的错误：
# ! `y`中的连接列必须存在于数据中。
# ✖ `cyl...1` 和 `cyl...2` 存在问题。

我不希望结果中重复出现列名cyl。

如何优雅地解决这个问题？

英文:

For some reason, my code repeats a column's name inside complete (it's inside a function and the argumet may be a column already present).

library(tidyverse)
# Works
mtcars %&gt;% complete(nesting(cyl, gear), vs)
# Doesn&#39;t work
mtcars %&gt;% complete(nesting(cyl, cyl), vs)
# Error in `dplyr::full_join()`:
# ! Join columns in `y` must be present in the data.
# ✖ Problem with `cyl...1` and `cyl...2`.

I don't want the column cyl to be repeated in the result.

How to make this work elegantly?

答案1

得分: 1

以下是您要翻译的内容：

Edit: If the idea is to remove duplicated column names in nesting, you could use your own function:

my_nesting <- function(...){
  nesting(..., .name_repair = "minimal") %>%
    subset(select = !duplicated(colnames(.)))
}
mtcars %>% complete(my_nesting(cyl = cyl, cyl = cyl), vs) 
#Works

Why is it not working?

_{Not really an answer, but this is too long to be a comment.}

In your example case, I'm not sure what's the point of repeating the column twice, since the combination of two identical columns is the same as the combination of one of the two (check below).

with the default behavior

Under the hood, complete uses full_join and expand. expand works with duplicate names:

data = mtcars
out <- expand(data, nesting(cyl, cyl), vs)

But full_join does not because names have changed:

dplyr::full_join(out, data, by = names(out))

Indeed, the default behavior of nesting (and expand) when there are identical names is .name_repair = "check_unique":

with(mtcars, nesting(cyl, cyl)) 
#   cyl...1 cyl...2
# 1       4       4
# 2       6       6
# 3       8       8

with .name_repair = "minimal"

If you specify .name_repair = "minimal" on top of overwriting the columns name with identical names, you'll have the same name:

with(mtcars, nesting(cyl = cyl, cyl = cyl, .name_repair = "minimal"))
#     cyl   cyl
# 1     4     4
# 2     6     6
# 3     8     8

But, here full_join is blocking the process again, because names must be unique:

out <- expand(data, nesting(cyl = cyl, cyl = cyl, .name_repair = "minimal"), vs, .name_repair = "minimal")
dplyr::full_join(out, data, by = names(out))
#Error in `dplyr::full_join()`:
#! Input columns in `x` must be unique.
#✖ Problem with `cyl`.

So, what can be done?

Not so much: there does not seem to be an apparent reason to nest identical columns since the unique combinations will be the same.
You could create other variables in your original data.frame to match the column names created through the process.

英文:

Edit: If the idea is to remove duplicated column names in nesting, you could use your own function:

my_nesting &lt;- function(...){
  nesting(..., .name_repair = &quot;minimal&quot;) %&gt;%
    subset(select = !duplicated(colnames(.)))
}
mtcars %&gt;% complete(my_nesting(cyl = cyl, cyl = cyl), vs) 
#Works

Why is it not working?

<sub> Not really an answer, but this is too long to be a comment. </sub>

In your example case, I'm not sure what's the point of repeating the column twice, since the combination of two identical columns is the same as the combination of one of the two (check below).

with the default behavior

Under the hood, complete uses full_join and expand. expand works with duplicate names:

data = mtcars
out &lt;- expand(data, nesting(cyl, cyl), vs)

But full_join does not because names have changed:

dplyr::full_join(out, data, by = names(out))

Indeed, the default behavior of nesting (and expand) when there are identical names is .name_repair = "check_unique":

with(mtcars, nesting(cyl, cyl)) 
#   cyl...1 cyl...2
# 1       4       4
# 2       6       6
# 3       8       8

with .name_repair = "minimal"

If you specify .name_repair = "minimal" on top of overwriting the columns name with identical names, you'll have the same name:

with(mtcars, nesting(cyl = cyl, cyl = cyl, .name_repair = &quot;minimal&quot;))
#     cyl   cyl
# 1     4     4
# 2     6     6
# 3     8     8

But, here full_join is blocking the process again, because names must be unique:

out &lt;- expand(data, nesting(cyl = cyl, cyl = cyl, .name_repair = &quot;minimal&quot;), vs, .name_repair = &quot;minimal&quot;)
dplyr::full_join(out, data, by = names(out))
#Error in `dplyr::full_join()`:
#! Input columns in `x` must be unique.
#✖ Problem with `cyl`.

So, what can be done?

Not so much: there does not seem to be an apparent reason to nest identical columns since the unique combinations will be the same.
You could create other variables in your original data.frame to match the column names created through the process.

答案2

得分: 1

You may replace nesting with select. select does not allow duplicate columns so this would solve your issue.

Comparing the result of select with nesting to make sure they give identical results.

英文:

You may replace nesting with select. select does not allow duplicate columns so this would solve your issue.

library(dplyr)
library(tidyr)
mtcars %&gt;% complete(select(., cyl, gear), vs)
# A tibble: 38 &#215; 11
#     cyl  gear    vs   mpg  disp    hp  drat    wt  qsec    am  carb
#   &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
# 1     4     3     0  NA    NA      NA NA    NA     NA      NA    NA
# 2     4     3     1  21.5 120.     97  3.7   2.46  20.0     0     1
# 3     4     4     0  NA    NA      NA NA    NA     NA      NA    NA
# 4     4     4     1  22.8 108      93  3.85  2.32  18.6     1     1
# 5     4     4     1  24.4 147.     62  3.69  3.19  20       0     2
# 6     4     4     1  22.8 141.     95  3.92  3.15  22.9     0     2
# 7     4     4     1  32.4  78.7    66  4.08  2.2   19.5     1     1
# 8     4     4     1  30.4  75.7    52  4.93  1.62  18.5     1     2
# 9     4     4     1  33.9  71.1    65  4.22  1.84  19.9     1     1
#10     4     4     1  27.3  79      66  4.08  1.94  18.9     1     1
# ℹ 28 more rows
# ℹ Use `print(n = ...)` to see more rows

Comparing the result of select with nesting to make sure they give identical results.

identical(mtcars %&gt;% complete(nesting(cyl, gear), vs), 
          mtcars %&gt;% complete(select(., cyl, gear), vs))
#[1] TRUE
identical(mtcars %&gt;% complete(nesting(cyl), vs),
          mtcars %&gt;% complete(select(., cyl, cyl), vs))
#[1] TRUE

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在同一列嵌套两次时使tidyr::complete起作用？

问题

答案1

Why is it not working?

Why is it not working?

答案2

创建和填充一个数组

如何在不更改底层数据的情况下更改 plot_model 中的 facets 顺序？

你可以使用R来将数据框转置，使某一列成为列名，而另一列填充值。

将 JSON 列表转换为数据框。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论