如何防止dplyr::select合并名称而不是分配新名称?

huangapple go评论100阅读模式
英文:

How to prevent dplyr::select from combining names rather than assigning a new name?

问题

  1. # 我试图基于另一个命名的向量选择列,并同时为该列分配一个新名称。但是,dplyr似乎会合并名称,我在文档中找不到停止此操作的选项。
  2. data <- data.frame(day = 1,
  3. week = 2,
  4. n_in_year = 365)
  5. new_values <- c(period = "day",
  6. n = 365)
  7. # 这会合并名称而不是分配新名称
  8. data %>%
  9. dplyr::select(time = new_values[1])
  10. # 例如
  11. # time...period
  12. # 1 1
  13. # 我希望它的行为像这样
  14. data %>%
  15. dplyr::select(new_values[1]) %>%
  16. dplyr::rename(time = period)
英文:

I'm trying to select columns based on another named vector and assign a new name for that column at the same time. However dplyr appears to combine the names and I can't see an option to stop this in the documentation.

  1. # dplyr ‘1.1.2’
  2. # R Version 4.3.0
  3. data &lt;- data.frame(day = 1,
  4. week = 2,
  5. n_in_year = 365)
  6. new_values &lt;- c(period = &quot;day&quot;,
  7. n = 365)
  8. # This combines the names rather than assigning a new name
  9. data %&gt;%
  10. dplyr::select(time = new_values[1])
  11. # e.g
  12. # time...period
  13. # 1 1
  14. # I want it to behave like this
  15. data %&gt;%
  16. dplyr::select(new_values[1]) %&gt;%
  17. dplyr::rename(time = period)

答案1

得分: 2

问题

您的问题是new_values仍然保留了其names()

  1. data %>% dplyr::select(time = new_values[1])
  2. #> time...period
  3. #> 1 1
  4. data %>% dplyr::select(time = unname(new_values)[1])
  5. #> time
  6. #> 1 1

这种行为是有意为之的,用于dplyr::select()所使用的"整洁选择"。传递一个命名的character向量(如new_values)的列名将允许程序化用户在各种层次中“组合”和“传播”列名。下面的文档以symbol而不是字符串来说明这一点:

  1. mtcars %>% select_loc(foo = c(mpg, cyl))
  2. #> foo1 foo2
  3. #> 1 2
  1. mtcars %>% select_loc(foo = c(bar = mpg, baz = cyl))
  2. #> foo...bar foo...baz
  3. #> 1 2
  1. mtcars %>% select_loc(foo = c(bar = c(mpg, cyl)))
  2. #> foo...bar1 foo...bar2
  3. #> 1 2

解决方案

虽然unname()可以完成任务,但最好直接使用[[来提取没有名称(period)的值...

  1. # |---|
  2. data %>% dplyr::select(time = new_values[[1]])
  3. #> time
  4. #> 1 1
  5. # |----------|
  6. data %>% dplyr::select(time = new_values[["period"]])
  7. #> time
  8. #> 1 1

...或者更好的办法是将new_values制作成一个list,这样其值(如365)不会全部被强制转换为字符串(如"365")在一个character向量中:

  1. # 原始的'new_values'作为向量...
  2. new_values <- c(period = "day", n = 365)
  3. new_values
  4. #> period n
  5. #> "day" "365"
  6. # ...以及新的'new_values'作为列表:
  7. new_values <- list(period = "day", n = 365)
  8. new_values
  9. #> $period
  10. #> [1] "day"
  11. #>
  12. #> $n
  13. #> [1] 365
  14. # 轻松选择()您想要的: |-----|
  15. data %>% dplyr::select(time = new_values$period)
  16. #> time
  17. #> 1 1
英文:

Issue

Your issue is that new_values still has its names():

  1. data %&gt;% dplyr::select(time = new_values[1])
  2. #&gt; time...period
  3. #&gt; 1 1
  4. data %&gt;% dplyr::select(time = unname(new_values)[1])
  5. #&gt; time
  6. #&gt; 1 1

This behavior is intentional, for the "tidy selection" used by dplyr::select(). Passing a named character vector (like new_values) of column names will allow a programmatic user to "combine" and "propagate" column names in various hierarchies. This is illustrated by the documentation below, with symbols rather than strings:

> mtcars %>% select_loc(foo = c(mpg, cyl))
> #> foo1 foo2
> #> 1 2

> mtcars %>% select_loc(foo = c(bar = mpg, baz = cyl))
> #> foo...bar foo...baz
> #> 1 2

> mtcars %>% select_loc(foo = c(bar = c(mpg, cyl)))
> #> foo...bar1 foo...bar2
> #> 1 2

Solution

While unname() does the job, you're better off just using [[ to extract the value without the name (period)...

  1. # |---|
  2. data %&gt;% dplyr::select(time = new_values[[1]])
  3. #&gt; time
  4. #&gt; 1 1
  5. # |----------|
  6. data %&gt;% dplyr::select(time = new_values[[&quot;period&quot;]])
  7. #&gt; time
  8. #&gt; 1 1

...or better yet, making new_values a list, so its values (like 365) are not all coerced to strings (like &quot;365&quot;) in a character vector:

  1. # Original &#39;new_values&#39; as a vector...
  2. new_values &lt;- c(period = &quot;day&quot;, n = 365)
  3. new_values
  4. #&gt; period n
  5. #&gt; &quot;day&quot; &quot;365&quot;
  6. # ...and new &#39;new_values&#39; as a list:
  7. new_values &lt;- list(period = &quot;day&quot;, n = 365)
  8. new_values
  9. #&gt; $period
  10. #&gt; [1] &quot;day&quot;
  11. #&gt;
  12. #&gt; $n
  13. #&gt; [1] 365
  14. # Easily select() what you want: |-----|
  15. data %&gt;% dplyr::select(time = new_values$period)
  16. #&gt; time
  17. #&gt; 1 1

答案2

得分: 1

请检查下面更新的代码:

  1. # dplyr ‘1.1.2’
  2. # R Version 4.3.0
  3. data <- data.frame(day = 1,
  4. week = 2,
  5. n_in_year = 365)
  6. new_values <- c(period = "day",
  7. n = 365)
  8. # 这个组合了名称,而不是分配新名称
  9. data %>%
  10. dplyr::select(time = new_values[[1]])
  11. time
  12. 1 1
  13. # 我希望它的行为像这样
  14. data %>%
  15. dplyr::select(new_values[[1]]) %>%
  16. dplyr::rename(time = day)

请注意,我保留了代码中的英文部分,只翻译了注释和一些注释中的内容。

英文:

Please check the updated code below

  1. # dplyr ‘1.1.2’
  2. # R Version 4.3.0
  3. data &lt;- data.frame(day = 1,
  4. week = 2,
  5. n_in_year = 365)
  6. new_values &lt;- c(period = &quot;day&quot;,
  7. n = 365)
  8. # This combines the names rather than assigning a new name
  9. data %&gt;%
  10. dplyr::select(time = new_values[[1]])
  11. time
  12. 1 1
  13. # I want it to behave like this
  14. data %&gt;%
  15. dplyr::select(new_values[[1]]) %&gt;%
  16. dplyr::rename(time = day)

huangapple
  • 本文由 发表于 2023年6月27日 20:14:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76564753.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定