英文:
R data.table dynamic column name of group by returning new table
问题
默认情况下,对数据表执行分组操作会返回一个新的数据表,其中包含一个自动命名的列 V1
:
dt <- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
dt[, mean(a), by = id]
# id V1
# 1: 1 48.2
# 2: 2 47.9
# 3: 3 46.8
# 4: 4 54.7
# 5: 5 63.7
# 6: 6 50.6
# 7: 7 43.3
# 8: 8 52.7
# 9: 9 45.4
# 10: 10 51.7
根据这篇帖子,我可以这样设置结果列的名称:
dt[, list(mean = mean(a)), by = id]
是否可以为列名使用一个变量?例如,不显式设置 mean
,而是像这样做:
column_name <- "mean"
dt[, list(column_name = mean(a)), by = id] # 结果列名为 column_name(而不是 mean)
英文:
By default a group by operation on a data.table returns a new data.table with an automatically named column V1
:
dt <- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
dt[, mean(a), by = id]
# id V1
# 1: 1 48.2
# 2: 2 47.9
# 3: 3 46.8
# 4: 4 54.7
# 5: 5 63.7
# 6: 6 50.6
# 7: 7 43.3
# 8: 8 52.7
# 9: 9 45.4
# 10: 10 51.7
Following this post I can set the name of the column with the results like so
dt[, list(mean = mean(a)), by = id]
Is it possible to have a variable for the column name? E.g., instead of setting mean
explicitly I would like to do something like
column_name <- "mean"
dt[, list(column_name = mean(a)), by = id] # resulting column name is column_name (and not mean)
答案1
得分: 1
我们可以使用 setNames
函数。
library(data.table)
dt[, setNames(list(mean(a)), column_name), by = id]
# id mean
# 1: 1 56.8
# 2: 2 50.5
# 3: 3 50.5
# 4: 4 42.4
# 5: 5 49.9
# 6: 6 47.8
# 7: 7 60.6
# 8: 8 57.4
# 9: 9 54.6
#10: 10 34.5
数据
set.seed(123)
dt <- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
column_name <- "mean"
英文:
We can use setNames
library(data.table)
dt[, setNames(list(mean(a)), column_name), by = id]
# id mean
# 1: 1 56.8
# 2: 2 50.5
# 3: 3 50.5
# 4: 4 42.4
# 5: 5 49.9
# 6: 6 47.8
# 7: 7 60.6
# 8: 8 57.4
# 9: 9 54.6
#10: 10 34.5
data
set.seed(123)
dt <- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
column_name <- "mean"
答案2
得分: 1
我们可以使用data.table
中的setnames
函数。
library(data.table)
setnames(dt[, .(mean(a)), by = id], 'V1', column_name)[]
# id mean
# 1: 1 56.8
# 2: 2 50.5
# 3: 3 50.5
# 4: 4 42.4
# 5: 5 49.9
# 6: 6 47.8
# 7: 7 60.6
# 8: 8 57.4
# 9: 9 54.6
#10: 10 34.5
数据
set.seed(123)
dt <- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
column_name <- "mean"
英文:
We can use setnames
from data.table
library(data.table)
setnames(dt[, .(mean(a)), by = id], 'V1', column_name)[]
# id mean
# 1: 1 56.8
# 2: 2 50.5
# 3: 3 50.5
# 4: 4 42.4
# 5: 5 49.9
# 6: 6 47.8
# 7: 7 60.6
# 8: 8 57.4
# 9: 9 54.6
#10: 10 34.5
###data
set.seed(123)
dt <- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
column_name <- "mean"
答案3
得分: 1
为了完整起见,您还可以部署一个返回命名列表的循环。例如,使用Map()
:
dt[
, Map(
function(i) {
mean(a)
}
, i = "Mean"
)
, by = id
]
或者对于2个或更多函数调用/列:
dt[
, Map(
function(i, fun) {
do.call(
fun
, list(a)
)
}
, i = c("Mean", "SD")
, fun = c(mean, sd)
)
, by = id
]
# id Mean SD
# 1: 1 56.8 29.23012
# 2: 2 50.5 26.18842
# 3: 3 50.5 24.82047
# 4: 4 42.4 34.72495
# 5: 5 49.9 26.99979
# 6: 6 47.8 28.35411
# 7: 7 60.6 31.52142
# 8: 8 57.4 32.22904
# 9: 9 54.6 27.90141
# 10: 10 34.5 30.94529
英文:
For the sake of completeness, you could also deploy a loop that returns a named list. For example, using Map()
:
dt[
, Map(
function(i) {
mean(a)
}
, i = "Mean"
)
, by = id
]
Or for 2+ function calls/columns:
dt[
, Map(
function(i, fun) {
do.call(
fun
, list(a)
)
}
, i = c("Mean", "SD")
, fun = c(mean, sd)
)
, by = id
]
# id Mean SD
# 1: 1 56.8 29.23012
# 2: 2 50.5 26.18842
# 3: 3 50.5 24.82047
# 4: 4 42.4 34.72495
# 5: 5 49.9 26.99979
# 6: 6 47.8 28.35411
# 7: 7 60.6 31.52142
# 8: 8 57.4 32.22904
# 9: 9 54.6 27.90141
# 10: 10 34.5 30.94529
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论