如何防止dplyr::select合并名称而不是分配新名称?

huangapple go评论73阅读模式
英文:

How to prevent dplyr::select from combining names rather than assigning a new name?

问题

# 我试图基于另一个命名的向量选择列,并同时为该列分配一个新名称。但是,dplyr似乎会合并名称,我在文档中找不到停止此操作的选项。

data <- data.frame(day = 1,
                   week = 2,
                   n_in_year = 365)

new_values <- c(period = "day",
 n = 365)

# 这会合并名称而不是分配新名称
data %>%
  dplyr::select(time = new_values[1])

# 例如
#   time...period
# 1             1

# 我希望它的行为像这样
data %>%
  dplyr::select(new_values[1]) %>%
  dplyr::rename(time = period)
英文:

I'm trying to select columns based on another named vector and assign a new name for that column at the same time. However dplyr appears to combine the names and I can't see an option to stop this in the documentation.

# dplyr ‘1.1.2’
# R Version 4.3.0

data &lt;- data.frame(day = 1,
                   week = 2,
                   n_in_year = 365)

new_values &lt;- c(period = &quot;day&quot;,
 n = 365)

# This combines the names rather than assigning a new name
data %&gt;%
  dplyr::select(time = new_values[1])

# e.g
#   time...period
# 1             1

# I want it to behave like this
data %&gt;%
  dplyr::select(new_values[1]) %&gt;%
  dplyr::rename(time = period)

答案1

得分: 2

问题

您的问题是new_values仍然保留了其names()

data %>% dplyr::select(time = new_values[1])
#>   time...period
#> 1             1

data %>% dplyr::select(time = unname(new_values)[1])
#>   time
#> 1    1

这种行为是有意为之的,用于dplyr::select()所使用的"整洁选择"。传递一个命名的character向量(如new_values)的列名将允许程序化用户在各种层次中“组合”和“传播”列名。下面的文档以symbol而不是字符串来说明这一点:

mtcars %>% select_loc(foo = c(mpg, cyl))
#> foo1 foo2 
#>    1    2
mtcars %>% select_loc(foo = c(bar = mpg, baz = cyl))
#> foo...bar foo...baz 
#>         1         2
mtcars %>% select_loc(foo = c(bar = c(mpg, cyl)))
#> foo...bar1 foo...bar2 
#>          1          2

解决方案

虽然unname()可以完成任务,但最好直接使用[[来提取没有名称(period)的值...

#                                       |---|
data %>% dplyr::select(time = new_values[[1]])
#>   time
#> 1    1

#                                       |----------|
data %>% dplyr::select(time = new_values[["period"]])
#>   time
#> 1    1

...或者更好的办法是将new_values制作成一个list,这样其值(如365)不会全部被强制转换为字符串(如"365")在一个character向量中:

# 原始的'new_values'作为向量...
new_values <- c(period = "day", n = 365)
new_values
#> period      n 
#>  "day"  "365" 

# ...以及新的'new_values'作为列表:
new_values <- list(period = "day", n = 365)
new_values
#> $period
#> [1] "day"
#> 
#> $n
#> [1] 365


# 轻松选择()您想要的:        |-----|
data %>% dplyr::select(time = new_values$period)
#>   time
#> 1    1
英文:

Issue

Your issue is that new_values still has its names():

data %&gt;% dplyr::select(time = new_values[1])
#&gt;   time...period
#&gt; 1             1

data %&gt;% dplyr::select(time = unname(new_values)[1])
#&gt;   time
#&gt; 1    1

This behavior is intentional, for the "tidy selection" used by dplyr::select(). Passing a named character vector (like new_values) of column names will allow a programmatic user to "combine" and "propagate" column names in various hierarchies. This is illustrated by the documentation below, with symbols rather than strings:

> mtcars %>% select_loc(foo = c(mpg, cyl))
> #> foo1 foo2
> #> 1 2

> mtcars %>% select_loc(foo = c(bar = mpg, baz = cyl))
> #> foo...bar foo...baz
> #> 1 2

> mtcars %>% select_loc(foo = c(bar = c(mpg, cyl)))
> #> foo...bar1 foo...bar2
> #> 1 2

Solution

While unname() does the job, you're better off just using [[ to extract the value without the name (period)...

#                                       |---|
data %&gt;% dplyr::select(time = new_values[[1]])
#&gt;   time
#&gt; 1    1

#                                       |----------|
data %&gt;% dplyr::select(time = new_values[[&quot;period&quot;]])
#&gt;   time
#&gt; 1    1

...or better yet, making new_values a list, so its values (like 365) are not all coerced to strings (like &quot;365&quot;) in a character vector:

# Original &#39;new_values&#39; as a vector...
new_values &lt;- c(period = &quot;day&quot;, n = 365)
new_values
#&gt; period      n 
#&gt;  &quot;day&quot;  &quot;365&quot; 

# ...and new &#39;new_values&#39; as a list:
new_values &lt;- list(period = &quot;day&quot;, n = 365)
new_values
#&gt; $period
#&gt; [1] &quot;day&quot;
#&gt; 
#&gt; $n
#&gt; [1] 365


# Easily select() what you want:        |-----|
data %&gt;% dplyr::select(time = new_values$period)
#&gt;   time
#&gt; 1    1

答案2

得分: 1

请检查下面更新的代码:

# dplyr ‘1.1.2’
# R Version 4.3.0

data <- data.frame(day = 1,
                   week = 2,
                   n_in_year = 365)

new_values <- c(period = "day",
                n = 365)

# 这个组合了名称,而不是分配新名称
data %>%
  dplyr::select(time = new_values[[1]])

  time
1    1

# 我希望它的行为像这样
data %>%
  dplyr::select(new_values[[1]]) %>%
  dplyr::rename(time = day)

请注意,我保留了代码中的英文部分,只翻译了注释和一些注释中的内容。

英文:

Please check the updated code below

# dplyr ‘1.1.2’
# R Version 4.3.0

data &lt;- data.frame(day = 1,
                   week = 2,
                   n_in_year = 365)

new_values &lt;- c(period = &quot;day&quot;,
                n = 365)

# This combines the names rather than assigning a new name
data %&gt;%
  dplyr::select(time = new_values[[1]])

  time
1    1

# I want it to behave like this
data %&gt;%
  dplyr::select(new_values[[1]]) %&gt;%
  dplyr::rename(time = day)

huangapple
  • 本文由 发表于 2023年6月27日 20:14:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76564753.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定