2023年5月15日 10:54:49go评论70阅读模式

英文:

What is the most memory efficient way to append items to an existing list in R?

问题

我有一个在R中的列表，如下例中的my_list2。

我想以最小化峰值RAM使用的方式向列表添加项目。

除了使用append函数之外，是否有更节省内存的方法？

我知道按照下面的例子最佳实践是创建一个'空'列表，然后像my_list2一样填充它，但这不是一个选项，因为列表已经存在。

# 如果我可以从头开始创建列表，我会这样做：
my_list <- vector('list', 10)
for (i in 1:10) {
  my_list[[i]] <- i
}

# 除了'append'函数，是否有更好的方法？
my_list2 <- list(1)
for (i in 2:10) {
  my_list2 <- append(my_list2, i)
}

英文:

I have a list in R, my_list2 in the example below.

I want to add items to the list in a way that minimises the peak RAM usage.

Is there a more memory efficient way to do this than using the append function?

I'm aware that it's best practice to create an 'empty' list then fill it as per my_list2 in the example below, but this isn't an option as the list already exists.

# If I could create the list from scratch I&#39;d do it list this:
my_list &lt;- vector(&#39;list&#39;, 10)
for (i in 1:10) {
  my_list[[i]] &lt;- i
}

# Is there a better way than the &#39;append&#39; function?
my_list2 &lt;- list(1)
for (i in 2:10) {
  my_list2 &lt;- append(my_list2, i)
}

答案1

得分: 5

使用append()在每次迭代中，你可以创建一个临时列表，最后一次性将其附加到my_list2。这对你来说可以吗？

以下是在for循环中进行了5,000次迭代的示例：

my_list <- list(1)
my_list2 <- list(1)

bench::mark(
  orig = {
    for (i in 2:5000) {
      my_list <- append(my_list, i)
    }
    my_list
  },
  mine = {
    tmp <- vector("list", 4999)
    for (i in 1:4999) {
      tmp[[i]] <- i + 1
    }
    append(my_list2, tmp)
  },
  iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 orig       420.01ms    1.69s     0.567    95.7MB     13.6
#> 2 mine         1.52ms      2ms   406.       96.8KB      0

英文:

Rather than using append() in each iteration, you could create a temporary list and append it to my_list2 only once at the end. Would this do the job for you?

Here's an example with 5k iterations in the for loop:

my_list &lt;- list(1)
my_list2 &lt;- list(1)

bench::mark(
  orig = {
    for (i in 2:5000) {
      my_list &lt;- append(my_list, i)
    }
    my_list
  },
  mine = {
    tmp &lt;- vector(&quot;list&quot;, 4999)
    for (i in 1:4999) {
      tmp[[i]] &lt;- i + 1
    }
    append(my_list2, tmp)
  },
  iterations = 10
)
#&gt; Warning: Some expressions had a GC in every iteration; so filtering is
#&gt; disabled.
#&gt; # A tibble: 2 &#215; 6
#&gt;   expression      min   median `itr/sec` mem_alloc `gc/sec`
#&gt;   &lt;bch:expr&gt; &lt;bch:tm&gt; &lt;bch:tm&gt;     &lt;dbl&gt; &lt;bch:byt&gt;    &lt;dbl&gt;
#&gt; 1 orig       420.01ms    1.69s     0.567    95.7MB     13.6
#&gt; 2 mine         1.52ms      2ms   406.       96.8KB      0

Note that bench::mark() automatically checks that both codes give the same output.

答案2

得分: 1

以下是翻译好的代码部分：

一个具有低峰值RAM使用率的实际解决方案可能如下所示：
```R
my_list <- list(1)
N <- length(my_list)
length(my_list) <- N + 9
for (i in 2:10) {
  my_list[[N + i -1]] <- i
  #gc() #可选
}

你可以使用 gc 来获取峰值RAM使用率。但这在执行过程中是否进行了垃圾收集会受到很大影响。要查看可能的最小峰值，可以打开 gctorture，但执行时间会变得更慢。由于结果可能受到调用方法的顺序影响，我每次都会启动一个新的基本会话。

#使用append
n <- 1e5
gctorture(on=TRUE)

set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
#         used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3     664228 35.5   285638 15.3
#Vcells 633121  4.9    8388608 64.0   633121  4.9
for (i in 2:10) L <- append(L, list(sample(n)))
gc()
#          used (Mb) gc trigger (Mb) max used (Mb)
#Ncells  344156 18.4     664228 35.5   345174 18.5
#Vcells 1215086  9.3    8388608 64.0  1265554  9.7

#使用[[<-
n <- 1e5
gctorture(on=TRUE)

set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
#         used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3     664228 35.5   285638 15.3
#Vcells 633121  4.9    8388608 64.0   633121  4.9
for (i in 2:10) L[[length(L)+1]] <- sample(n)
gc()
#          used (Mb) gc trigger (Mb) max used (Mb)
#Ncells  346937 18.6     664228 35.5   347919 18.6
#Vcells 1221639  9.4    8388608 64.0  1272088  9.8

#使用[[<-，但在之前调整列表大小
n <- 1e5
gctorture(on=TRUE)

set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
#         used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3     664228 35.5   285638 15.3
#Vcells 633121  4.9    8388608 64.0   633121  4.9
N <- length(L)
length(L) <- N + 9
for (i in 2:10) L[[N - 1 + i]] <- sample(n)
gc()
#          used (Mb) gc trigger (Mb) max used (Mb)
#Ncells  346564 18.6     664228 35.5   347498 18.6
#Vcells 1220761  9.4    8388608 64.0  1271479  9.8

在这里，append 需要 8.0 Mb，而 [[<- 无论在调整列表大小之前与否，都需要 8.2 Mb。

如果不使用 gctorture，而是在每个步骤之后手动使用 gc，则得到以下结果：

#使用append
n <- 1e5

set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
#         used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3     664228 35.5   285638 15.3
#Vcells 633121  4.9    8388608 64.0   633121  4.9
for (i in 2:10) {L <- append(L, list(sample(n))); gc()}
gc()
#          used (Mb) gc trigger (Mb) max used (Mb)
#Ncells  344145 18.4     664228 35.5   372952 20.0
#Vcells 1215054  9.3    8388608 64.0  1319826 10.1

#使用[[<-
n <- 1e5

set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
#         used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3     664228 35.5   285638 15.3
#Vcells 633121  4.9    8388608 64.0   633121  4.9
for (i in 2:10) {L[[length(L)+1]] <- sample(n); gc()}
gc()
#          used (Mb) gc trigger (Mb) max used (Mb)
#Ncells  346926 18.6     664228 35.5   377474 20.2
#Vcells 1221607  9.4    8388608 64.0  1352555 10.4

n <- 1e5

set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
#         used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3     664228 35.5   285638 15.3
#Vcells 633121  4.9    8388608 64.0   633121  4.9
N <- length(L)
length(L) <- N + 9
for (i in 2:10) {L[[N - 1 + i]] <- sample(n); gc()}
gc()
#          used (Mb) gc trigger (Mb) max used (Mb)
#Ncells  347659 18.6     

<details>
<summary>英文:</summary>

A practical solution with low peak RAM usage can look like:

my_list <- list(1)
N <- length(my_list)
length(my_list) <- N + 9
for (i in 2:10) {
my_list[[N + i -1]] <- i
#gc() #Optional
}

You can use `gc` to get the **peak RAM usage**. But this is much influenced whether there was a garbage collection or not during execution. To see the minimum possible peak `gctorture` could be turned on, but then the execution time gets typical much slower. As the result could be influenced by the order how the methods are called I start each time a new vanilla session.

#Using append
n <- 1e5
gctorture(on=TRUE)

set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)

used (Mb) gc trigger (Mb) max used (Mb)

#Ncells 285638 15.3 664228 35.5 285638 15.3
#Vcells 633121 4.9 8388608 64.0 633121 4.9
for (i in 2:10) L <- append(L, list(sample(n)))
gc()

used (Mb) gc trigger (Mb) max used (Mb)

#Ncells 344156 18.4 664228 35.5 345174 18.5
#Vcells 1215086 9.3 8388608 64.0 1265554 9.7

#Using [[<-
n <- 1e5
gctorture(on=TRUE)

set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)

used (Mb) gc trigger (Mb) max used (Mb)

#Ncells 285638 15.3 664228 35.5 285638 15.3
#Vcells 633121 4.9 8388608 64.0 633121 4.9
for (i in 2:10) L[[length(L)+1]] <- sample(n)
gc()

used (Mb) gc trigger (Mb) max used (Mb)

#Ncells 346937 18.6 664228 35.5 347919 18.6
#Vcells 1221639 9.4 8388608 64.0 1272088 9.8

#Using [[<- but resizing the list before
n <- 1e5
gctorture(on=TRUE)

set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)

used (Mb) gc trigger (Mb) max used (Mb)

#Ncells 285638 15.3 664228 35.5 285638 15.3
#Vcells 633121 4.9 8388608 64.0 633121 4.9
N <- length(L)
length(L) <- N + 9
for (i in 2:10) L[[N - 1 + i]] <- sample(n)
gc()

used (Mb) gc trigger (Mb) max used (Mb)

#Ncells 346564 18.6 664228 35.5 347498 18.6
#Vcells 1220761 9.4 8388608 64.0 1271479 9.8

Here `append` needs 8.0 Mb and `[[&lt;-` 8.2 Mb independent if the list size is increased before or not.

---
Doing the same but without `gctorture` but manually using `gc` after each step gives:

#Using append
n <- 1e5

set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)

used (Mb) gc trigger (Mb) max used (Mb)

#Ncells 285638 15.3 664228 35.5 285638 15.3
#Vcells 633121 4.9 8388608 64.0 633121 4.9
for (i in 2:10) {L <- append(L, list(sample(n))); gc()}
gc()

used (Mb) gc trigger (Mb) max used (Mb)

#Ncells 344145 18.4 664228 35.5 372952 20.0
#Vcells 1215054 9.3 8388608 64.0 1319826 10.1

#Using [[<-
n <- 1e5

set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)

used (Mb) gc trigger (Mb) max used (Mb)

#Ncells 285638 15.3 664228 35.5 285638 15.3
#Vcells 633121 4.9 8388608 64.0 633121 4.9
for (i in 2:10) {L[[length(L)+1]] <- sample(n); gc()}
gc()

used (Mb) gc trigger (Mb) max used (Mb)

#Ncells 346926 18.6 664228 35.5 377474 20.2
#Vcells 1221607 9.4 8388608 64.0 1352555 10.4

n <- 1e5

set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)

used (Mb) gc trigger (Mb) max used (Mb)

#Ncells 285638 15.3 664228 35.5 285638 15.3
#Vcells 633121 4.9 8388608 64.0 633121 4.9
N <- length(L)
length(L) <- N + 9
for (i in 2:10) {L[[N - 1 + i]] <- sample(n); gc()}
gc()

used (Mb) gc trigger (Mb) max used (Mb)

#Ncells 347659 18.6 664771 35.6 374526 20.1
#Vcells 1223042 9.4 8388608 64.0 1273592 9.8

Here `append` needs 9.9 Mb, `[[&lt;-` without resizing the list in advance 10.4 Mb and when the list size is increased before 9.7 Mb.

---
In case you want to know the total amount of allocated but maybe in the meantime also freed memory or other options have a look at [Monitor memory usage in R](https://stackoverflow.com/questions/7856306).

</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中，向现有列表附加项目的最节省内存的方法是什么？

问题

答案1

答案2

used (Mb) gc trigger (Mb) max used (Mb)

used (Mb) gc trigger (Mb) max used (Mb)

used (Mb) gc trigger (Mb) max used (Mb)

used (Mb) gc trigger (Mb) max used (Mb)

used (Mb) gc trigger (Mb) max used (Mb)

used (Mb) gc trigger (Mb) max used (Mb)

used (Mb) gc trigger (Mb) max used (Mb)

used (Mb) gc trigger (Mb) max used (Mb)

used (Mb) gc trigger (Mb) max used (Mb)

used (Mb) gc trigger (Mb) max used (Mb)

used (Mb) gc trigger (Mb) max used (Mb)

used (Mb) gc trigger (Mb) max used (Mb)

如何避免在ggarrange中裁剪标签？

手动绘制未缩放数值的GLM预测结果。

从两个具有相同键的列表创建字典

获取家庭成员数量以及语言上同质的家庭如何？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论