2023年2月14日 01:39:40go评论73阅读模式

英文:

How is R able to sum an integer sequence so fast?

问题

创建一个大的连续整数序列：

x <- 1:1e20

R如何能够如此快速地计算sum？

sum(x)

它不需要循环遍历1e20个向量元素并求和吗？

英文:

Create a large contiguous sequence of integers:

x &lt;- 1:1e20

How is R able to compute the sum so fast?

sum(x)

Doesn't it have to loop over 1e20 elements in the vector and sum each element?

答案1

得分: 6

总结评论：

R引入了一种称为ALTREP（ALternate REPresentation）的东西，用于处理R对象。其目的是更高效地执行一些操作。从https://www.r-project.org/dsc/2017/slides/dsc2017.pdf，一些示例包括：

允许向量数据存储在内存映射文件或分布式系统中
允许紧凑的表示算术序列
允许向对象添加元数据
允许延迟进行计算/分配
支持环境的替代表示

第二和第四点似乎在这里适用。

我们可以通过查看我推测是R sum原始函数的核心部分，位于https://github.com/wch/r-source/blob/7c0449d81c853f781fb13e9c7118065aedaf2f7f/src/main/altclasses.c#L262中，来看到这一点：

static SEXP compact_intseq_Sum(SEXP x, Rboolean narm)
{
#ifdef COMPACT_INTSEQ_MUTABLE
    /* If the vector has been expanded it may have been modified. */
    if (COMPACT_SEQ_EXPANDED(x) != R_NilValue) 
	return NULL;
#endif
    double tmp;
    SEXP info = COMPACT_SEQ_INFO(x);
    R_xlen_t size = COMPACT_INTSEQ_INFO_LENGTH(info);
    R_xlen_t n1 = COMPACT_INTSEQ_INFO_FIRST(info);
    int inc = COMPACT_INTSEQ_INFO_INCR(info);
    tmp = (size / 2.0) * (n1 + n1 + inc * (size - 1));
    if(tmp &gt; INT_MAX || tmp &lt; R_INT_MIN)
	/**** check for overflow of exact integer range? */
	return ScalarReal(tmp);
    else
	return ScalarInteger((int) tmp);
}

换句话说，对于没有间隔的整数序列的减少是微不足道的。当存在间隔或NA时，事情变得更加复杂。

在实际操作中：

vec &lt;- 1:1e10
sum(vec)
# [1] 5e+19
sum(vec[-10])
# Error: cannot allocate vector of size 37.3 Gb
### win11, R-4.2.2

理想情况下，我们会看到sum(vec) == (sum(vec[-10]) + 10)，但由于我们不能使用序列求和的优化，所以不能这样做。

英文:

Summing up the comments:

R introduced something called ALTREP, or ALternate REPresentation for R objects. Its intent is to do some things more efficiently. From https://www.r-project.org/dsc/2017/slides/dsc2017.pdf, some examples include:

allow vector data to be in a memory-mapped file or distributed
allow compact representation of arithmetic sequences;
allow adding meta-data to objects;
allow computations/allocations to be deferred;
support alternative representations of environments.

The second and fourth bullets seem appropriate here.

We can see a hint of this in action by looking at what I'm inferring is at the core of the R sum primitive for altreps, at https://github.com/wch/r-source/blob/7c0449d81c853f781fb13e9c7118065aedaf2f7f/src/main/altclasses.c#L262:

static SEXP compact_intseq_Sum(SEXP x, Rboolean narm)
{
#ifdef COMPACT_INTSEQ_MUTABLE
    /* If the vector has been expanded it may have been modified. */
    if (COMPACT_SEQ_EXPANDED(x) != R_NilValue) 
	return NULL;
#endif
    double tmp;
    SEXP info = COMPACT_SEQ_INFO(x);
    R_xlen_t size = COMPACT_INTSEQ_INFO_LENGTH(info);
    R_xlen_t n1 = COMPACT_INTSEQ_INFO_FIRST(info);
    int inc = COMPACT_INTSEQ_INFO_INCR(info);
    tmp = (size / 2.0) * (n1 + n1 + inc * (size - 1));
    if(tmp &gt; INT_MAX || tmp &lt; R_INT_MIN)
	/**** check for overflow of exact integer range? */
	return ScalarReal(tmp);
    else
	return ScalarInteger((int) tmp);
}

Namely, the reduction of an integer sequence without gaps is trivial. It's when there are gaps or NAs that things become a bit more complicated.

In action:

vec &lt;- 1:1e10
sum(vec)
# [1] 5e+19
sum(vec[-10])
# Error: cannot allocate vector of size 37.3 Gb
### win11, R-4.2.2

Where ideally we would see that sum(vec) == (sum(vec[-10]) + 10), but we cannot since we can't use the optimization of sequence-summing.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

R如何如此迅速地对整数序列求和？

问题

答案1

生产者消费者队列最佳实践和性能

分割和汇总结果，无需循环。

线性回归使用带有缺失值的数据集

你可以根据在R中的字符串是否包含特定值来改变数据。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论