英文:
Build bigsparser::SFBM matrix iteratively
问题
以下是要翻译的内容:
我希望构建一个具有 10^7
列和 2500
行的矩阵。由于这对我的计算机来说太大了,我想我可以逐步创建矩阵。我想使用 bigsparser
软件包来存储矩阵到磁盘上。
这是我如何创建第一个矩阵的方法:
library(bigsparser)
library(data.table)
library(Matrix)
nvars <- 10000000 # 列
ncons <- 10 # 行
n_nonzero <- round(0.02*nvars*ncons) # 大约值,实际可能会有更少的值
set.seed(13)
# 第一个表
Amat <- data.frame(
i=sample.int(ncons, n_nonzero, replace=TRUE),
j=sample.int(nvars, n_nonzero, replace=TRUE),
x=runif(n_nonzero)
)
setDT(Amat)
Amat <- unique(Amat, by=c("i", "j"))
AmatSparse <- sparseMatrix(
i=Amat[,get("i")], j=Amat[,get("j")], x=Amat[,get("x")],
dims=c(2500, 10^7L)
)
AmatSFBM <- as_SFBM(AmatSparse, backingfile="sparsemat", compact = FALSE)
如你所见,我预先知道最终矩阵的维度,并已相应设置。
现在我想要添加一些行,就像这样:
for (iter in 2:250) {
Amat <- data.frame(
i=sample.int(ncons, n_nonzero, replace=TRUE),
j=sample.int(nvars, n_nonzero, replace=TRUE),
x=runif(n_nonzero)
)
setDT(Amat)
Amat <- unique(Amat, by=c("i", "j"))
Amat[,i:=i+(iter-1)*500]
# 这不起作用:
AmatSFBM[Amat[,get("i")], Amat[,get("j")]] <- Amat[,get("x")]
}
然而,SFBM
对象似乎不支持 ]<-
操作符。
有没有办法构建 SFBM
对象,而不是从稀疏矩阵转换而来?例如,
*
英文:
I wish to build a matrix with 10^7
columns and 2500
rows. Since this is too large for my computer, I thought I could create the matrix iteratively. I would like to use the bigsparser
package for storing the matrix on disk.
Here is how I create the first matrix:
library(bigsparser)
library(data.table)
library(Matrix)
nvars <- 10000000 # columns
ncons <- 10 # rows
n_nonzero <- round(0.02*nvars*ncons) # approximate, there may be actually less values
set.seed(13)
# the first table
Amat <- data.frame(
i=sample.int(ncons, n_nonzero, replace=TRUE),
j=sample.int(nvars, n_nonzero, replace=TRUE),
x=runif(n_nonzero)
)
setDT(Amat)
Amat <- unique(Amat, by=c("i", "j"))
AmatSparse <- sparseMatrix(
i=Amat[,get("i")], j=Amat[,get("j")], x=Amat[,get("x")],
dims=c(2500, 10^7L)
)
AmatSFBM <- as_SFBM(AmatSparse, backingfile="sparsemat", compact = FALSE)
As you can see, I know the dimensions of the final matrix beforehand and have set it accordingly.
Now I want to add some rows, like that:
for (iter in 2:250) {
Amat <- data.frame(
i=sample.int(ncons, n_nonzero, replace=TRUE),
j=sample.int(nvars, n_nonzero, replace=TRUE),
x=runif(n_nonzero)
)
setDT(Amat)
Amat <- unique(Amat, by=c("i", "j"))
Amat[,i:=i+(iter-1)*500]
# this does not work:
AmatSFBM[Amat[,get("i")], Amat[,get("j")]] <- Amat[,get("x")]
}
However, the ]<-
operator seems not to work for SFBM
objects.
Is there any way to build a SFBM
object other than as_SFBM
from a sparse matrix? For example,
- can I add two SFBM objects of the same dimensions
- can I create a SFBM object from a CSV file or similar?
Both would be fine.
答案1
得分: 3
SFBM
类有一个$add_columns()
方法,您可以使用它来迭代增长您的矩阵。通常,在内存受限时,避免不必要的中间赋值是个好主意。在下面的代码片段中,我首先编写一个生成组件稀疏矩阵的函数。然后我创建一个起始矩阵,最后迭代地添加组件矩阵。在这个示例中,我将迭代次数限制为9
,但您可以将其设置为249
以获得完整的矩阵。
英文:
The SFBM
class has a method $add_columns()
which you can use to iteratively grow your matrix. Generally, when you are memory constrained, it is a good idea to avoid unnecessary intermediate assignments. In the following piece of code I first write a function to generate the component sparse matrices. Then I create a starting matrix and finally iteratively add the component matrices. I've limited it to 9
iterations for this example, but you can just set it to 249
to get your full matrix.
library(bigsparser)
library(data.table)
library(Matrix)
set.seed(13)
# Function to generate component matrix
generate_sparse_mat <- \(nrow = 2500, ncol = 40000, n_nonzero = round(0.02*nrow*ncol)) {
data.table(
i = sample.int(nrow, n_nonzero, replace = TRUE),
j = sample.int(ncol, n_nonzero, replace = TRUE),
x = runif(n_nonzero)
) |>
unique(by = c("i", "j")) |>
as.list() |>
c(dims = list(c(nrow, ncol))) |>
do.call(what = sparseMatrix)
}
# Starting matrix
mat <- generate_sparse_mat() |>
as_SFBM(compact = FALSE)
# Iteratively add colums
for (k in seq_len(9)) mat$add_columns(generate_sparse_mat(), offset_i = 0)
mat
#> A Sparse Filebacked Big Matrix with 2500 rows and 400000 columns.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论