英文:
Improving performance while looping over data.table with set
问题
我想知道是否有更好的方法来编写以下代码以提高性能。
我的真实数据集有120,000个ID,每个ID有25行。
我想对每行应用指数预测。
library(data.table)
#虚拟数据集
dt <- data.table(
ID = rep(c("A","B"), each=5),
Value = abs(round(rnorm(10)*10))
)
#初始化第一行的列和值
dt[, SES := 0]
#按ID拆分成列表以便用lapply循环
dt <- split(dt, dt$ID)
#用于循环的函数
alpha <- 0.3
loop_function <- function(x) {
for(i in 2L:5L) {
set(x, i, "SES", round(x[i, alpha * Value] + x[i-1L, (1L - alpha) * SES], 0))
}
return(x)
}
#将函数应用于列表元素并绑定结果
dt <- lapply(dt, loop_function)
dt <- rbindlist(dt)
英文:
I wonder if there is a better way to code the following to improve the performance.
My real data set has 120k id's with each 25 rows.
I would like to apply an exponential forecast rowise
library(data.table)
#dummy data set
dt <- data.table(
ID = rep(c("A","B"), each=5),
Value = abs(round(rnorm(10)*10))
)
#Initialize column and value for 1st row
dt[, SES := 0]
#split by ID into list to loop over with lapply
dt <- split(dt, dt$ID)
#function to loop with
alpha <- 0.3
loop_function <- function(x) {
for(i in 2L:5L) {
set(x, i, "SES", round(x[i, alpha * Value] + x[i-1L, (1L - alpha) * SES], 0))
}
return(x)
}
#apply function to list elements and bind result
dt <- lapply(dt, loop_function)
dt <- rbindlist(dt)
答案1
得分: 2
这应该快得多:
library(data.table)
# 虚拟数据集
dt <- data.table(
ID = rep(c("A","B"), each=5),
Value = abs(round(rnorm(10)*10))
)
# 初始化第一行的列和值
dt[, SES := 0]
# 创建索引并进行迭代
dt[, idx:= rowid(ID)]
for(i in 2:max(dt$idx))
{
prev <- dt[idx==(i-1L), SES]
dt[idx==i, SES:= {
round(alpha * Value + (1L - alpha) * prev, 0)
}]
}
这与您的想法非常相似,意味着它会在索引上进行迭代(2:5L),但以一种经过优化的data.table方式。希望这有所帮助
英文:
This should be much faster:
library(data.table)
#dummy data set
dt <- data.table(
ID = rep(c("A","B"), each=5),
Value = abs(round(rnorm(10)*10))
)
#Initialize column and value for 1st row
dt[, SES := 0]
# Create index and iterate over it
dt[, idx:= rowid(ID)]
for(i in 2:max(dt$idx))
{
prev <- dt[idx==(i-1L), SES]
dt[idx==i, SES:= {
round(alpha * Value + (1L - alpha) * prev, 0)
}]
}
It is in the end pretty similar to your idea, meaning it iterates over indexes (2:5L) but in an optimised, data.table way. Hope this helps
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论