2020年1月3日 17:44:16go评论191阅读模式

英文:

R data.table dynamic column name of group by returning new table

问题

默认情况下，对数据表执行分组操作会返回一个新的数据表，其中包含一个自动命名的列 V1：

dt <- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
dt[, mean(a), by = id]
#     id V1
# 1:  1 48.2
# 2:  2 47.9
# 3:  3 46.8
# 4:  4 54.7
# 5:  5 63.7
# 6:  6 50.6
# 7:  7 43.3
# 8:  8 52.7
# 9:  9 45.4
# 10: 10 51.7

根据这篇帖子，我可以这样设置结果列的名称：

dt[, list(mean = mean(a)), by = id]

是否可以为列名使用一个变量？例如，不显式设置 mean，而是像这样做：

column_name <- "mean"
dt[, list(column_name = mean(a)), by = id]  # 结果列名为 column_name（而不是 mean）

英文:

By default a group by operation on a data.table returns a new data.table with an automatically named column V1:

dt &lt;- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
dt[, mean(a), by = id]
#     id V1
# 1:  1 48.2
# 2:  2 47.9
# 3:  3 46.8
# 4:  4 54.7
# 5:  5 63.7
# 6:  6 50.6
# 7:  7 43.3
# 8:  8 52.7
# 9:  9 45.4
# 10: 10 51.7

Following this post I can set the name of the column with the results like so

dt[, list(mean = mean(a)), by = id]

Is it possible to have a variable for the column name? E.g., instead of setting mean explicitly I would like to do something like

column_name &lt;- &quot;mean&quot;
dt[, list(column_name = mean(a)), by = id]  # resulting column name is column_name (and not mean)

答案1

得分: 1

我们可以使用 setNames 函数。

library(data.table)
dt[, setNames(list(mean(a)), column_name), by = id]
#    id mean
# 1:  1 56.8
# 2:  2 50.5
# 3:  3 50.5
# 4:  4 42.4
# 5:  5 49.9
# 6:  6 47.8
# 7:  7 60.6
# 8:  8 57.4
# 9:  9 54.6
#10: 10 34.5

数据

set.seed(123)
dt <- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
column_name <- "mean"

英文:

We can use setNames

library(data.table)
dt[, setNames(list(mean(a)), column_name), by = id]
#    id mean
# 1:  1 56.8
# 2:  2 50.5
# 3:  3 50.5
# 4:  4 42.4
# 5:  5 49.9
# 6:  6 47.8
# 7:  7 60.6
# 8:  8 57.4
# 9:  9 54.6
#10: 10 34.5

data

set.seed(123)
dt &lt;- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
column_name &lt;- &quot;mean&quot;

答案2

得分: 1

我们可以使用data.table中的setnames函数。

library(data.table)
setnames(dt[, .(mean(a)), by = id], 'V1', column_name)[]
#    id mean
# 1:  1 56.8
# 2:  2 50.5
# 3:  3 50.5
# 4:  4 42.4
# 5:  5 49.9
# 6:  6 47.8
# 7:  7 60.6
# 8:  8 57.4
# 9:  9 54.6
#10: 10 34.5

数据

set.seed(123)
dt <- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
column_name <- "mean"

英文:

We can use setnames from data.table

library(data.table)
setnames(dt[, .(mean(a)), by = id], &#39;V1&#39;, column_name)[]
#    id mean
# 1:  1 56.8
# 2:  2 50.5
# 3:  3 50.5
# 4:  4 42.4
# 5:  5 49.9
# 6:  6 47.8
# 7:  7 60.6
# 8:  8 57.4
# 9:  9 54.6
#10: 10 34.5

###data

set.seed(123)
dt &lt;- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
column_name &lt;- &quot;mean&quot;

答案3

得分: 1

为了完整起见，您还可以部署一个返回命名列表的循环。例如，使用Map()：

dt[
  , Map(
    function(i) {
      mean(a)
    }
    , i = "Mean"
  )
  , by = id
]

或者对于2个或更多函数调用/列：

dt[
  , Map(
    function(i, fun) {
      do.call(
        fun
        , list(a)
      )
    }
    , i = c("Mean", "SD")
    , fun = c(mean, sd)
  )
  , by = id
]
#     id Mean       SD
#  1:  1 56.8 29.23012
#  2:  2 50.5 26.18842
#  3:  3 50.5 24.82047
#  4:  4 42.4 34.72495
#  5:  5 49.9 26.99979
#  6:  6 47.8 28.35411
#  7:  7 60.6 31.52142
#  8:  8 57.4 32.22904
#  9:  9 54.6 27.90141
# 10: 10 34.5 30.94529

英文:

For the sake of completeness, you could also deploy a loop that returns a named list. For example, using Map():

dt[
  , Map(
    function(i) {
      mean(a)
    }
    , i = &quot;Mean&quot;
  )
  , by = id
]

Or for 2+ function calls/columns:

dt[
  , Map(
    function(i, fun) {
      do.call(
        fun
        , list(a)
      )
    }
    , i = c(&quot;Mean&quot;, &quot;SD&quot;)
    , fun = c(mean, sd)
  )
  , by = id
]
#     id Mean       SD
#  1:  1 56.8 29.23012
#  2:  2 50.5 26.18842
#  3:  3 50.5 24.82047
#  4:  4 42.4 34.72495
#  5:  5 49.9 26.99979
#  6:  6 47.8 28.35411
#  7:  7 60.6 31.52142
#  8:  8 57.4 32.22904
#  9:  9 54.6 27.90141
# 10: 10 34.5 30.94529

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

R data.table按组动态列名返回新表格

问题

答案1

答案2

数据

答案3

Python从CSV文件中读取数据集

如何安排重复的示例代码以在输出中按顺序排列

在R中为每个小时创建一个虚拟矩阵？

使用R将包含罗马数字的字符串转换为数字。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论