构建大型 bigsparser::SFBM 矩阵的迭代过程

huangapple go评论70阅读模式
英文:

Build bigsparser::SFBM matrix iteratively

问题

以下是要翻译的内容:

我希望构建一个具有 10^7 列和 2500 行的矩阵。由于这对我的计算机来说太大了,我想我可以逐步创建矩阵。我想使用 bigsparser 软件包来存储矩阵到磁盘上。

这是我如何创建第一个矩阵的方法:

library(bigsparser)
library(data.table)
library(Matrix)
nvars <- 10000000  # 列
ncons <- 10        # 行
n_nonzero <- round(0.02*nvars*ncons) # 大约值,实际可能会有更少的值
set.seed(13)

# 第一个表
Amat <- data.frame(
    i=sample.int(ncons, n_nonzero, replace=TRUE),
    j=sample.int(nvars, n_nonzero, replace=TRUE),
    x=runif(n_nonzero)
)
setDT(Amat)
Amat <- unique(Amat, by=c("i", "j"))
AmatSparse <- sparseMatrix(
    i=Amat[,get("i")], j=Amat[,get("j")], x=Amat[,get("x")],
    dims=c(2500, 10^7L)
)
AmatSFBM <- as_SFBM(AmatSparse, backingfile="sparsemat", compact = FALSE)

如你所见,我预先知道最终矩阵的维度,并已相应设置。

现在我想要添加一些行,就像这样:

for (iter in 2:250) {
    Amat <- data.frame(
        i=sample.int(ncons, n_nonzero, replace=TRUE),
        j=sample.int(nvars, n_nonzero, replace=TRUE),
        x=runif(n_nonzero)
    )
    setDT(Amat)
    Amat <- unique(Amat, by=c("i", "j"))
    Amat[,i:=i+(iter-1)*500]

    # 这不起作用:
    AmatSFBM[Amat[,get("i")], Amat[,get("j")]] <- Amat[,get("x")]
}

然而,SFBM 对象似乎不支持 ]&lt;- 操作符。

有没有办法构建 SFBM 对象,而不是从稀疏矩阵转换而来?例如,
*

英文:

I wish to build a matrix with 10^7 columns and 2500 rows. Since this is too large for my computer, I thought I could create the matrix iteratively. I would like to use the bigsparser package for storing the matrix on disk.

Here is how I create the first matrix:

library(bigsparser)
library(data.table)
library(Matrix)
nvars &lt;- 10000000  # columns
ncons &lt;- 10        # rows
n_nonzero &lt;- round(0.02*nvars*ncons) # approximate, there may be actually less values
set.seed(13)

# the first table
Amat &lt;- data.frame(
	i=sample.int(ncons, n_nonzero, replace=TRUE),
	j=sample.int(nvars, n_nonzero, replace=TRUE),
	x=runif(n_nonzero)
)
setDT(Amat)
Amat &lt;- unique(Amat, by=c(&quot;i&quot;, &quot;j&quot;))
AmatSparse &lt;- sparseMatrix(
	i=Amat[,get(&quot;i&quot;)], j=Amat[,get(&quot;j&quot;)], x=Amat[,get(&quot;x&quot;)],
	dims=c(2500, 10^7L)
)
AmatSFBM &lt;- as_SFBM(AmatSparse, backingfile=&quot;sparsemat&quot;, compact = FALSE)

As you can see, I know the dimensions of the final matrix beforehand and have set it accordingly.

Now I want to add some rows, like that:

for (iter in 2:250) {
	Amat &lt;- data.frame(
		i=sample.int(ncons, n_nonzero, replace=TRUE),
		j=sample.int(nvars, n_nonzero, replace=TRUE),
		x=runif(n_nonzero)
	)
	setDT(Amat)
	Amat &lt;- unique(Amat, by=c(&quot;i&quot;, &quot;j&quot;))
	Amat[,i:=i+(iter-1)*500]

    # this does not work:
	AmatSFBM[Amat[,get(&quot;i&quot;)], Amat[,get(&quot;j&quot;)]] &lt;- Amat[,get(&quot;x&quot;)]
}

However, the ]&lt;- operator seems not to work for SFBM objects.

Is there any way to build a SFBM object other than as_SFBM from a sparse matrix? For example,

  • can I add two SFBM objects of the same dimensions
  • can I create a SFBM object from a CSV file or similar?

Both would be fine.

答案1

得分: 3

SFBM类有一个$add_columns()方法,您可以使用它来迭代增长您的矩阵。通常,在内存受限时,避免不必要的中间赋值是个好主意。在下面的代码片段中,我首先编写一个生成组件稀疏矩阵的函数。然后我创建一个起始矩阵,最后迭代地添加组件矩阵。在这个示例中,我将迭代次数限制为9,但您可以将其设置为249以获得完整的矩阵。

英文:

The SFBM class has a method $add_columns()⁠ which you can use to iteratively grow your matrix. Generally, when you are memory constrained, it is a good idea to avoid unnecessary intermediate assignments. In the following piece of code I first write a function to generate the component sparse matrices. Then I create a starting matrix and finally iteratively add the component matrices. I've limited it to 9 iterations for this example, but you can just set it to 249 to get your full matrix.

library(bigsparser)
library(data.table)
library(Matrix)

set.seed(13)

# Function to generate component matrix
generate_sparse_mat &lt;- \(nrow = 2500, ncol = 40000, n_nonzero = round(0.02*nrow*ncol)) {
  data.table(
    i = sample.int(nrow, n_nonzero, replace = TRUE),
    j = sample.int(ncol, n_nonzero, replace = TRUE),
    x = runif(n_nonzero)
  ) |&gt;
    unique(by = c(&quot;i&quot;, &quot;j&quot;)) |&gt;
    as.list() |&gt;
    c(dims = list(c(nrow, ncol))) |&gt;
    do.call(what = sparseMatrix)
}

# Starting matrix
mat &lt;- generate_sparse_mat() |&gt; 
  as_SFBM(compact = FALSE)

# Iteratively add colums
for (k in seq_len(9)) mat$add_columns(generate_sparse_mat(), offset_i = 0)

mat
#&gt; A Sparse Filebacked Big Matrix with 2500 rows and 400000 columns.

huangapple
  • 本文由 发表于 2023年2月19日 01:03:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/75494934.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定