问题

我发现我的 R 代码中最昂贵的部分是以下的 `sapply` 调用：

```lang-r
L <- 2000
score <- sample(1:3, L, replace = TRUE)
d <- c(0, -1, 0.5)
sapply(1:L, function(i) sum(d[1:score[i]]))

这个调用对向量 d 从索引 1 到索引 score[i] 求和，循环遍历变量 score 中的每个元素。挑战在于，这段代码作为优化程序的一部分进行评估，并且运行了很多次。

我正在尝试以矢量化的方式执行相同的计算，但有些困难。我想我可以创建一个类似这样的矩阵：

d.mat <- matrix(rep(d, L), nrow = L, byrow = TRUE)

然后以某种方式计算 rowSums(d.mat)，但是从第一列到第 score[i] 列在第 i 行。有没有人知道如何在不循环的情况下做到这一点？我想这可能比 sapply 要快得多，如果可能的话，鉴于以下基准测试中 rowSums 的相对速度：

library(microbenchmark)
microbenchmark(sapply(1:L, function(i) sum(d[1:score[i]])), 
               rowSums(d.mat),
               times = 100)

或者也许有人能看到更好的第三个选择。


<details>
<summary>英文:</summary>

I have found that the most expensive part of my R code is the following `sapply` call:

```lang-r
L &lt;- 2000
score &lt;- sample(1:3, L, replace = TRUE)
d &lt;- c(0, -1, 0.5)
sapply(1:L, function(i) sum(d[1:score[i]]))

That call takes the sum of the vector d from index 1 to index score[i], looping over each element in the variable score. The challenge is that this code is evaluated as part of an optimization routine and run many, many times.

I am trying to perform the same computation in a vectorized way, but struggling a bit. I suppose that I could create a matrix like this:

d.mat &lt;- matrix(rep(d, L), nrow = L, byrow = TRUE)

then somehow compute rowSums(d.mat) but from column 1 to column score[i] in row i. Is anyone aware of a way to do that without looping? I imagine that that would be much faster than sapply, if possible at all, given the relative speed of rowSums in the following benchmark:

library(microbenchmark)
microbenchmark(sapply(1:L, function(i) sum(d[1:score[i]])), 
               rowSums(d.mat),
               times = 100)

Or perhaps someone sees a better third option.

答案1

得分: 9

Index the cumsum:

microbenchmark::microbenchmark(
  sapply = sapply(1:L, function(i) sum(d[1:score[i]])),
  index = cumsum(d)[score],
  check = "equal"
)
#> Unit: microseconds
#>    expr    min      lq     mean  median     uq    max neval
#>  sapply 2494.8 2698.00 3232.279 2805.35 3516.2 6868.4   100
#>   index    4.3    5.05    8.682    6.90    8.9   60.2   100

英文:

Index the cumsum:

microbenchmark::microbenchmark(
  sapply = sapply(1:L, function(i) sum(d[1:score[i]])),
  index = cumsum(d)[score],
  check = &quot;equal&quot;
)
#&gt; Unit: microseconds
#&gt;    expr    min      lq     mean  median     uq    max neval
#&gt;  sapply 2494.8 2698.00 3232.279 2805.35 3516.2 6868.4   100
#&gt;   index    4.3    5.05    8.682    6.90    8.9   60.2   100

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何将此sapply调用中的循环向量化？

问题

答案1

计算R中列表的各元素的特定向量的平均值，并转换为data.frame。

传递向量化输入给 element_text 的正确方式是什么？

Unnesting/rectangling/flattening a nested list using `tidyr::unnest_longer()`

错误消息：拟合两个变量之间的非线性指数模型时。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论