英文:
What is the most memory efficient way to append items to an existing list in R?
问题
我有一个在R中的列表,如下例中的my_list2
。
我想以最小化峰值RAM使用的方式向列表添加项目。
除了使用append
函数之外,是否有更节省内存的方法?
我知道按照下面的例子最佳实践是创建一个'空'列表,然后像my_list2
一样填充它,但这不是一个选项,因为列表已经存在。
# 如果我可以从头开始创建列表,我会这样做:
my_list <- vector('list', 10)
for (i in 1:10) {
my_list[[i]] <- i
}
# 除了'append'函数,是否有更好的方法?
my_list2 <- list(1)
for (i in 2:10) {
my_list2 <- append(my_list2, i)
}
英文:
I have a list in R, my_list2
in the example below.
I want to add items to the list in a way that minimises the peak RAM usage.
Is there a more memory efficient way to do this than using the append
function?
I'm aware that it's best practice to create an 'empty' list then fill it as per my_list2
in the example below, but this isn't an option as the list already exists.
# If I could create the list from scratch I'd do it list this:
my_list <- vector('list', 10)
for (i in 1:10) {
my_list[[i]] <- i
}
# Is there a better way than the 'append' function?
my_list2 <- list(1)
for (i in 2:10) {
my_list2 <- append(my_list2, i)
}
答案1
得分: 5
使用append()
在每次迭代中,你可以创建一个临时列表,最后一次性将其附加到my_list2
。这对你来说可以吗?
以下是在for
循环中进行了5,000次迭代的示例:
my_list <- list(1)
my_list2 <- list(1)
bench::mark(
orig = {
for (i in 2:5000) {
my_list <- append(my_list, i)
}
my_list
},
mine = {
tmp <- vector("list", 4999)
for (i in 1:4999) {
tmp[[i]] <- i + 1
}
append(my_list2, tmp)
},
iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 orig 420.01ms 1.69s 0.567 95.7MB 13.6
#> 2 mine 1.52ms 2ms 406. 96.8KB 0
英文:
Rather than using append()
in each iteration, you could create a temporary list and append it to my_list2
only once at the end. Would this do the job for you?
Here's an example with 5k iterations in the for
loop:
my_list <- list(1)
my_list2 <- list(1)
bench::mark(
orig = {
for (i in 2:5000) {
my_list <- append(my_list, i)
}
my_list
},
mine = {
tmp <- vector("list", 4999)
for (i in 1:4999) {
tmp[[i]] <- i + 1
}
append(my_list2, tmp)
},
iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 orig 420.01ms 1.69s 0.567 95.7MB 13.6
#> 2 mine 1.52ms 2ms 406. 96.8KB 0
Note that bench::mark()
automatically checks that both codes give the same output.
答案2
得分: 1
以下是翻译好的代码部分:
一个具有低峰值RAM使用率的实际解决方案可能如下所示:
```R
my_list <- list(1)
N <- length(my_list)
length(my_list) <- N + 9
for (i in 2:10) {
my_list[[N + i -1]] <- i
#gc() #可选
}
你可以使用 gc
来获取峰值RAM使用率。但这在执行过程中是否进行了垃圾收集会受到很大影响。要查看可能的最小峰值,可以打开 gctorture
,但执行时间会变得更慢。由于结果可能受到调用方法的顺序影响,我每次都会启动一个新的基本会话。
#使用append
n <- 1e5
gctorture(on=TRUE)
set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
# used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3 664228 35.5 285638 15.3
#Vcells 633121 4.9 8388608 64.0 633121 4.9
for (i in 2:10) L <- append(L, list(sample(n)))
gc()
# used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 344156 18.4 664228 35.5 345174 18.5
#Vcells 1215086 9.3 8388608 64.0 1265554 9.7
#使用[[<-
n <- 1e5
gctorture(on=TRUE)
set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
# used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3 664228 35.5 285638 15.3
#Vcells 633121 4.9 8388608 64.0 633121 4.9
for (i in 2:10) L[[length(L)+1]] <- sample(n)
gc()
# used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 346937 18.6 664228 35.5 347919 18.6
#Vcells 1221639 9.4 8388608 64.0 1272088 9.8
#使用[[<-,但在之前调整列表大小
n <- 1e5
gctorture(on=TRUE)
set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
# used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3 664228 35.5 285638 15.3
#Vcells 633121 4.9 8388608 64.0 633121 4.9
N <- length(L)
length(L) <- N + 9
for (i in 2:10) L[[N - 1 + i]] <- sample(n)
gc()
# used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 346564 18.6 664228 35.5 347498 18.6
#Vcells 1220761 9.4 8388608 64.0 1271479 9.8
在这里,append
需要 8.0 Mb,而 [[<-
无论在调整列表大小之前与否,都需要 8.2 Mb。
如果不使用 gctorture
,而是在每个步骤之后手动使用 gc
,则得到以下结果:
#使用append
n <- 1e5
set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
# used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3 664228 35.5 285638 15.3
#Vcells 633121 4.9 8388608 64.0 633121 4.9
for (i in 2:10) {L <- append(L, list(sample(n))); gc()}
gc()
# used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 344145 18.4 664228 35.5 372952 20.0
#Vcells 1215054 9.3 8388608 64.0 1319826 10.1
#使用[[<-
n <- 1e5
set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
# used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3 664228 35.5 285638 15.3
#Vcells 633121 4.9 8388608 64.0 633121 4.9
for (i in 2:10) {L[[length(L)+1]] <- sample(n); gc()}
gc()
# used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 346926 18.6 664228 35.5 377474 20.2
#Vcells 1221607 9.4 8388608 64.0 1352555 10.4
n <- 1e5
set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
# used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3 664228 35.5 285638 15.3
#Vcells 633121 4.9 8388608 64.0 633121 4.9
N <- length(L)
length(L) <- N + 9
for (i in 2:10) {L[[N - 1 + i]] <- sample(n); gc()}
gc()
# used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 347659 18.6
<details>
<summary>英文:</summary>
A practical solution with low peak RAM usage can look like:
my_list <- list(1)
N <- length(my_list)
length(my_list) <- N + 9
for (i in 2:10) {
my_list[[N + i -1]] <- i
#gc() #Optional
}
You can use `gc` to get the **peak RAM usage**. But this is much influenced whether there was a garbage collection or not during execution. To see the minimum possible peak `gctorture` could be turned on, but then the execution time gets typical much slower. As the result could be influenced by the order how the methods are called I start each time a new vanilla session.
#Using append
n <- 1e5
gctorture(on=TRUE)
set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3 664228 35.5 285638 15.3
#Vcells 633121 4.9 8388608 64.0 633121 4.9
for (i in 2:10) L <- append(L, list(sample(n)))
gc()
used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 344156 18.4 664228 35.5 345174 18.5
#Vcells 1215086 9.3 8388608 64.0 1265554 9.7
#Using [[<-
n <- 1e5
gctorture(on=TRUE)
set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3 664228 35.5 285638 15.3
#Vcells 633121 4.9 8388608 64.0 633121 4.9
for (i in 2:10) L[[length(L)+1]] <- sample(n)
gc()
used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 346937 18.6 664228 35.5 347919 18.6
#Vcells 1221639 9.4 8388608 64.0 1272088 9.8
#Using [[<- but resizing the list before
n <- 1e5
gctorture(on=TRUE)
set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3 664228 35.5 285638 15.3
#Vcells 633121 4.9 8388608 64.0 633121 4.9
N <- length(L)
length(L) <- N + 9
for (i in 2:10) L[[N - 1 + i]] <- sample(n)
gc()
used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 346564 18.6 664228 35.5 347498 18.6
#Vcells 1220761 9.4 8388608 64.0 1271479 9.8
Here `append` needs 8.0 Mb and `[[<-` 8.2 Mb independent if the list size is increased before or not.
---
Doing the same but without `gctorture` but manually using `gc` after each step gives:
#Using append
n <- 1e5
set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3 664228 35.5 285638 15.3
#Vcells 633121 4.9 8388608 64.0 633121 4.9
for (i in 2:10) {L <- append(L, list(sample(n))); gc()}
gc()
used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 344145 18.4 664228 35.5 372952 20.0
#Vcells 1215054 9.3 8388608 64.0 1319826 10.1
#Using [[<-
n <- 1e5
set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3 664228 35.5 285638 15.3
#Vcells 633121 4.9 8388608 64.0 633121 4.9
for (i in 2:10) {L[[length(L)+1]] <- sample(n); gc()}
gc()
used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 346926 18.6 664228 35.5 377474 20.2
#Vcells 1221607 9.4 8388608 64.0 1352555 10.4
n <- 1e5
set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3 664228 35.5 285638 15.3
#Vcells 633121 4.9 8388608 64.0 633121 4.9
N <- length(L)
length(L) <- N + 9
for (i in 2:10) {L[[N - 1 + i]] <- sample(n); gc()}
gc()
used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 347659 18.6 664771 35.6 374526 20.1
#Vcells 1223042 9.4 8388608 64.0 1273592 9.8
Here `append` needs 9.9 Mb, `[[<-` without resizing the list in advance 10.4 Mb and when the list size is increased before 9.7 Mb.
---
In case you want to know the total amount of allocated but maybe in the meantime also freed memory or other options have a look at [Monitor memory usage in R](https://stackoverflow.com/questions/7856306).
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论