英文:
How to prevent dplyr::select from combining names rather than assigning a new name?
问题
# 我试图基于另一个命名的向量选择列,并同时为该列分配一个新名称。但是,dplyr似乎会合并名称,我在文档中找不到停止此操作的选项。
data <- data.frame(day = 1,
                   week = 2,
                   n_in_year = 365)
new_values <- c(period = "day",
 n = 365)
# 这会合并名称而不是分配新名称
data %>%
  dplyr::select(time = new_values[1])
# 例如
#   time...period
# 1             1
# 我希望它的行为像这样
data %>%
  dplyr::select(new_values[1]) %>%
  dplyr::rename(time = period)
英文:
I'm trying to select columns based on another named vector and assign a new name for that column at the same time. However dplyr appears to combine the names and I can't see an option to stop this in the documentation.
# dplyr ‘1.1.2’
# R Version 4.3.0
data <- data.frame(day = 1,
                   week = 2,
                   n_in_year = 365)
new_values <- c(period = "day",
 n = 365)
# This combines the names rather than assigning a new name
data %>%
  dplyr::select(time = new_values[1])
# e.g
#   time...period
# 1             1
# I want it to behave like this
data %>%
  dplyr::select(new_values[1]) %>%
  dplyr::rename(time = period)
答案1
得分: 2
问题
您的问题是new_values仍然保留了其names():
data %>% dplyr::select(time = new_values[1])
#>   time...period
#> 1             1
data %>% dplyr::select(time = unname(new_values)[1])
#>   time
#> 1    1
这种行为是有意为之的,用于dplyr::select()所使用的"整洁选择"。传递一个命名的character向量(如new_values)的列名将允许程序化用户在各种层次中“组合”和“传播”列名。下面的文档以symbol而不是字符串来说明这一点:
mtcars %>% select_loc(foo = c(mpg, cyl)) #> foo1 foo2 #> 1 2
mtcars %>% select_loc(foo = c(bar = mpg, baz = cyl)) #> foo...bar foo...baz #> 1 2
mtcars %>% select_loc(foo = c(bar = c(mpg, cyl))) #> foo...bar1 foo...bar2 #> 1 2
解决方案
虽然unname()可以完成任务,但最好直接使用[[来提取没有名称(period)的值...
#                                       |---|
data %>% dplyr::select(time = new_values[[1]])
#>   time
#> 1    1
#                                       |----------|
data %>% dplyr::select(time = new_values[["period"]])
#>   time
#> 1    1
...或者更好的办法是将new_values制作成一个list,这样其值(如365)不会全部被强制转换为字符串(如"365")在一个character向量中:
# 原始的'new_values'作为向量...
new_values <- c(period = "day", n = 365)
new_values
#> period      n 
#>  "day"  "365" 
# ...以及新的'new_values'作为列表:
new_values <- list(period = "day", n = 365)
new_values
#> $period
#> [1] "day"
#> 
#> $n
#> [1] 365
# 轻松选择()您想要的:        |-----|
data %>% dplyr::select(time = new_values$period)
#>   time
#> 1    1
英文:
Issue
Your issue is that new_values still has its names():
data %>% dplyr::select(time = new_values[1])
#>   time...period
#> 1             1
data %>% dplyr::select(time = unname(new_values)[1])
#>   time
#> 1    1
This behavior is intentional, for the "tidy selection" used by dplyr::select().  Passing a named character vector (like new_values) of column names will allow a programmatic user to "combine" and "propagate" column names in various hierarchies.  This is illustrated by the documentation below, with symbols rather than strings:
>     mtcars %>% select_loc(foo = c(mpg, cyl))
>     #> foo1 foo2
>     #>    1    2
>     mtcars %>% select_loc(foo = c(bar = mpg, baz = cyl))
>     #> foo...bar foo...baz
>     #>         1         2
>     mtcars %>% select_loc(foo = c(bar = c(mpg, cyl)))
>     #> foo...bar1 foo...bar2
>     #>          1          2
Solution
While unname() does the job, you're better off just using [[ to extract the value without the name (period)...
#                                       |---|
data %>% dplyr::select(time = new_values[[1]])
#>   time
#> 1    1
#                                       |----------|
data %>% dplyr::select(time = new_values[["period"]])
#>   time
#> 1    1
...or better yet, making new_values a list, so its values (like 365) are not all coerced to strings (like "365") in a character vector:
# Original 'new_values' as a vector...
new_values <- c(period = "day", n = 365)
new_values
#> period      n 
#>  "day"  "365" 
# ...and new 'new_values' as a list:
new_values <- list(period = "day", n = 365)
new_values
#> $period
#> [1] "day"
#> 
#> $n
#> [1] 365
# Easily select() what you want:        |-----|
data %>% dplyr::select(time = new_values$period)
#>   time
#> 1    1
答案2
得分: 1
请检查下面更新的代码:
# dplyr ‘1.1.2’
# R Version 4.3.0
data <- data.frame(day = 1,
                   week = 2,
                   n_in_year = 365)
new_values <- c(period = "day",
                n = 365)
# 这个组合了名称,而不是分配新名称
data %>%
  dplyr::select(time = new_values[[1]])
  time
1    1
# 我希望它的行为像这样
data %>%
  dplyr::select(new_values[[1]]) %>%
  dplyr::rename(time = day)
请注意,我保留了代码中的英文部分,只翻译了注释和一些注释中的内容。
英文:
Please check the updated code below
# dplyr ‘1.1.2’
# R Version 4.3.0
data <- data.frame(day = 1,
                   week = 2,
                   n_in_year = 365)
new_values <- c(period = "day",
                n = 365)
# This combines the names rather than assigning a new name
data %>%
  dplyr::select(time = new_values[[1]])
  time
1    1
# I want it to behave like this
data %>%
  dplyr::select(new_values[[1]]) %>%
  dplyr::rename(time = day)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论