如何将此sapply调用中的循环向量化?

huangapple go评论124阅读模式
英文:

How can I vectorize the loop in this sapply call?

问题

我发现我的 R 代码中最昂贵的部分是以下的 `sapply` 调用:

```lang-r
L <- 2000
score <- sample(1:3, L, replace = TRUE)
d <- c(0, -1, 0.5)
sapply(1:L, function(i) sum(d[1:score[i]]))

这个调用对向量 d 从索引 1 到索引 score[i] 求和,循环遍历变量 score 中的每个元素。挑战在于,这段代码作为优化程序的一部分进行评估,并且运行了很多次。

我正在尝试以矢量化的方式执行相同的计算,但有些困难。我想我可以创建一个类似这样的矩阵:

d.mat <- matrix(rep(d, L), nrow = L, byrow = TRUE)

然后以某种方式计算 rowSums(d.mat),但是从第一列到第 score[i] 列在第 i 行。有没有人知道如何在不循环的情况下做到这一点?我想这可能比 sapply 要快得多,如果可能的话,鉴于以下基准测试中 rowSums 的相对速度:

library(microbenchmark)
microbenchmark(sapply(1:L, function(i) sum(d[1:score[i]])), 
               rowSums(d.mat),
               times = 100)

或者也许有人能看到更好的第三个选择。


<details>
<summary>英文:</summary>

I have found that the most expensive part of my R code is the following `sapply` call:

```lang-r
L &lt;- 2000
score &lt;- sample(1:3, L, replace = TRUE)
d &lt;- c(0, -1, 0.5)
sapply(1:L, function(i) sum(d[1:score[i]]))

That call takes the sum of the vector d from index 1 to index score[i], looping over each element in the variable score. The challenge is that this code is evaluated as part of an optimization routine and run many, many times.

I am trying to perform the same computation in a vectorized way, but struggling a bit. I suppose that I could create a matrix like this:

d.mat &lt;- matrix(rep(d, L), nrow = L, byrow = TRUE)

then somehow compute rowSums(d.mat) but from column 1 to column score[i] in row i. Is anyone aware of a way to do that without looping? I imagine that that would be much faster than sapply, if possible at all, given the relative speed of rowSums in the following benchmark:

library(microbenchmark)
microbenchmark(sapply(1:L, function(i) sum(d[1:score[i]])), 
               rowSums(d.mat),
               times = 100)

Or perhaps someone sees a better third option.

答案1

得分: 9

Index the cumsum:

microbenchmark::microbenchmark(
  sapply = sapply(1:L, function(i) sum(d[1:score[i]])),
  index = cumsum(d)[score],
  check = "equal"
)
#> Unit: microseconds
#>    expr    min      lq     mean  median     uq    max neval
#>  sapply 2494.8 2698.00 3232.279 2805.35 3516.2 6868.4   100
#>   index    4.3    5.05    8.682    6.90    8.9   60.2   100
英文:

Index the cumsum:

microbenchmark::microbenchmark(
  sapply = sapply(1:L, function(i) sum(d[1:score[i]])),
  index = cumsum(d)[score],
  check = &quot;equal&quot;
)
#&gt; Unit: microseconds
#&gt;    expr    min      lq     mean  median     uq    max neval
#&gt;  sapply 2494.8 2698.00 3232.279 2805.35 3516.2 6868.4   100
#&gt;   index    4.3    5.05    8.682    6.90    8.9   60.2   100

huangapple
  • 本文由 发表于 2023年3月7日 09:15:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75657235.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定