英文:
How can I vectorize the loop in this sapply call?
问题
我发现我的 R 代码中最昂贵的部分是以下的 `sapply` 调用:
```lang-r
L <- 2000
score <- sample(1:3, L, replace = TRUE)
d <- c(0, -1, 0.5)
sapply(1:L, function(i) sum(d[1:score[i]]))
这个调用对向量 d
从索引 1 到索引 score[i]
求和,循环遍历变量 score
中的每个元素。挑战在于,这段代码作为优化程序的一部分进行评估,并且运行了很多次。
我正在尝试以矢量化的方式执行相同的计算,但有些困难。我想我可以创建一个类似这样的矩阵:
d.mat <- matrix(rep(d, L), nrow = L, byrow = TRUE)
然后以某种方式计算 rowSums(d.mat)
,但是从第一列到第 score[i]
列在第 i
行。有没有人知道如何在不循环的情况下做到这一点?我想这可能比 sapply
要快得多,如果可能的话,鉴于以下基准测试中 rowSums
的相对速度:
library(microbenchmark)
microbenchmark(sapply(1:L, function(i) sum(d[1:score[i]])),
rowSums(d.mat),
times = 100)
或者也许有人能看到更好的第三个选择。
<details>
<summary>英文:</summary>
I have found that the most expensive part of my R code is the following `sapply` call:
```lang-r
L <- 2000
score <- sample(1:3, L, replace = TRUE)
d <- c(0, -1, 0.5)
sapply(1:L, function(i) sum(d[1:score[i]]))
That call takes the sum of the vector d
from index 1 to index score[i]
, looping over each element in the variable score
. The challenge is that this code is evaluated as part of an optimization routine and run many, many times.
I am trying to perform the same computation in a vectorized way, but struggling a bit. I suppose that I could create a matrix like this:
d.mat <- matrix(rep(d, L), nrow = L, byrow = TRUE)
then somehow compute rowSums(d.mat)
but from column 1 to column score[i]
in row i
. Is anyone aware of a way to do that without looping? I imagine that that would be much faster than sapply
, if possible at all, given the relative speed of rowSums
in the following benchmark:
library(microbenchmark)
microbenchmark(sapply(1:L, function(i) sum(d[1:score[i]])),
rowSums(d.mat),
times = 100)
Or perhaps someone sees a better third option.
答案1
得分: 9
Index the cumsum
:
microbenchmark::microbenchmark(
sapply = sapply(1:L, function(i) sum(d[1:score[i]])),
index = cumsum(d)[score],
check = "equal"
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> sapply 2494.8 2698.00 3232.279 2805.35 3516.2 6868.4 100
#> index 4.3 5.05 8.682 6.90 8.9 60.2 100
英文:
Index the cumsum
:
microbenchmark::microbenchmark(
sapply = sapply(1:L, function(i) sum(d[1:score[i]])),
index = cumsum(d)[score],
check = "equal"
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> sapply 2494.8 2698.00 3232.279 2805.35 3516.2 6868.4 100
#> index 4.3 5.05 8.682 6.90 8.9 60.2 100
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论