R data.table按组动态列名返回新表格

huangapple go评论191阅读模式
英文:

R data.table dynamic column name of group by returning new table

问题

默认情况下,对数据表执行分组操作会返回一个新的数据表,其中包含一个自动命名的列 V1

  1. dt <- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
  2. dt[, mean(a), by = id]
  3. # id V1
  4. # 1: 1 48.2
  5. # 2: 2 47.9
  6. # 3: 3 46.8
  7. # 4: 4 54.7
  8. # 5: 5 63.7
  9. # 6: 6 50.6
  10. # 7: 7 43.3
  11. # 8: 8 52.7
  12. # 9: 9 45.4
  13. # 10: 10 51.7

根据这篇帖子,我可以这样设置结果列的名称:

  1. dt[, list(mean = mean(a)), by = id]

是否可以为列名使用一个变量?例如,不显式设置 mean,而是像这样做:

  1. column_name <- "mean"
  2. dt[, list(column_name = mean(a)), by = id] # 结果列名为 column_name(而不是 mean)
英文:

By default a group by operation on a data.table returns a new data.table with an automatically named column V1:

  1. dt &lt;- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
  2. dt[, mean(a), by = id]
  3. # id V1
  4. # 1: 1 48.2
  5. # 2: 2 47.9
  6. # 3: 3 46.8
  7. # 4: 4 54.7
  8. # 5: 5 63.7
  9. # 6: 6 50.6
  10. # 7: 7 43.3
  11. # 8: 8 52.7
  12. # 9: 9 45.4
  13. # 10: 10 51.7

Following this post I can set the name of the column with the results like so

  1. dt[, list(mean = mean(a)), by = id]

Is it possible to have a variable for the column name? E.g., instead of setting mean explicitly I would like to do something like

  1. column_name &lt;- &quot;mean&quot;
  2. dt[, list(column_name = mean(a)), by = id] # resulting column name is column_name (and not mean)

答案1

得分: 1

我们可以使用 setNames 函数。

  1. library(data.table)
  2. dt[, setNames(list(mean(a)), column_name), by = id]
  3. # id mean
  4. # 1: 1 56.8
  5. # 2: 2 50.5
  6. # 3: 3 50.5
  7. # 4: 4 42.4
  8. # 5: 5 49.9
  9. # 6: 6 47.8
  10. # 7: 7 60.6
  11. # 8: 8 57.4
  12. # 9: 9 54.6
  13. #10: 10 34.5

数据

  1. set.seed(123)
  2. dt <- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
  3. column_name <- "mean"
英文:

We can use setNames

  1. library(data.table)
  2. dt[, setNames(list(mean(a)), column_name), by = id]
  3. # id mean
  4. # 1: 1 56.8
  5. # 2: 2 50.5
  6. # 3: 3 50.5
  7. # 4: 4 42.4
  8. # 5: 5 49.9
  9. # 6: 6 47.8
  10. # 7: 7 60.6
  11. # 8: 8 57.4
  12. # 9: 9 54.6
  13. #10: 10 34.5

data

  1. set.seed(123)
  2. dt &lt;- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
  3. column_name &lt;- &quot;mean&quot;

答案2

得分: 1

我们可以使用data.table中的setnames函数。

  1. library(data.table)
  2. setnames(dt[, .(mean(a)), by = id], 'V1', column_name)[]
  3. # id mean
  4. # 1: 1 56.8
  5. # 2: 2 50.5
  6. # 3: 3 50.5
  7. # 4: 4 42.4
  8. # 5: 5 49.9
  9. # 6: 6 47.8
  10. # 7: 7 60.6
  11. # 8: 8 57.4
  12. # 9: 9 54.6
  13. #10: 10 34.5

数据

  1. set.seed(123)
  2. dt <- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
  3. column_name <- "mean"
英文:

We can use setnames from data.table

  1. library(data.table)
  2. setnames(dt[, .(mean(a)), by = id], &#39;V1&#39;, column_name)[]
  3. # id mean
  4. # 1: 1 56.8
  5. # 2: 2 50.5
  6. # 3: 3 50.5
  7. # 4: 4 42.4
  8. # 5: 5 49.9
  9. # 6: 6 47.8
  10. # 7: 7 60.6
  11. # 8: 8 57.4
  12. # 9: 9 54.6
  13. #10: 10 34.5

###data

  1. set.seed(123)
  2. dt &lt;- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
  3. column_name &lt;- &quot;mean&quot;

答案3

得分: 1

为了完整起见,您还可以部署一个返回命名列表的循环。例如,使用Map()

  1. dt[
  2. , Map(
  3. function(i) {
  4. mean(a)
  5. }
  6. , i = "Mean"
  7. )
  8. , by = id
  9. ]

或者对于2个或更多函数调用/列:

  1. dt[
  2. , Map(
  3. function(i, fun) {
  4. do.call(
  5. fun
  6. , list(a)
  7. )
  8. }
  9. , i = c("Mean", "SD")
  10. , fun = c(mean, sd)
  11. )
  12. , by = id
  13. ]
  14. # id Mean SD
  15. # 1: 1 56.8 29.23012
  16. # 2: 2 50.5 26.18842
  17. # 3: 3 50.5 24.82047
  18. # 4: 4 42.4 34.72495
  19. # 5: 5 49.9 26.99979
  20. # 6: 6 47.8 28.35411
  21. # 7: 7 60.6 31.52142
  22. # 8: 8 57.4 32.22904
  23. # 9: 9 54.6 27.90141
  24. # 10: 10 34.5 30.94529
英文:

For the sake of completeness, you could also deploy a loop that returns a named list. For example, using Map():

  1. dt[
  2. , Map(
  3. function(i) {
  4. mean(a)
  5. }
  6. , i = &quot;Mean&quot;
  7. )
  8. , by = id
  9. ]

Or for 2+ function calls/columns:

  1. dt[
  2. , Map(
  3. function(i, fun) {
  4. do.call(
  5. fun
  6. , list(a)
  7. )
  8. }
  9. , i = c(&quot;Mean&quot;, &quot;SD&quot;)
  10. , fun = c(mean, sd)
  11. )
  12. , by = id
  13. ]
  14. # id Mean SD
  15. # 1: 1 56.8 29.23012
  16. # 2: 2 50.5 26.18842
  17. # 3: 3 50.5 24.82047
  18. # 4: 4 42.4 34.72495
  19. # 5: 5 49.9 26.99979
  20. # 6: 6 47.8 28.35411
  21. # 7: 7 60.6 31.52142
  22. # 8: 8 57.4 32.22904
  23. # 9: 9 54.6 27.90141
  24. # 10: 10 34.5 30.94529

huangapple
  • 本文由 发表于 2020年1月3日 17:44:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/59576235.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定