快速矩阵操作在 R 中

huangapple go评论115阅读模式
英文:

Fast matrix operations in R

问题

  1. 我想要构建一个类型如下的矩阵 X
N = 1000
A = seq(1, N, 1)
B = A
X = A %*% t(rep(1,N)) - rep(1,N) %*% t(B)

处理这个矩阵 X 速度有点慢:对于较大的 N,需要大量内存,并且对其进行矩阵乘法操作似乎没有充分利用线性结构,这可能会有所帮助(?)。

因此,我想要避免这种情况,问题是:在需要计算 X 时,最快的计算方法是什么,可能不需要创建矩阵 A %*% t(rep(1,N))rep(1,N) %*% t(B)

  1. 目前,不同类型的矩阵乘法的最快方法是什么?有一些新的包,据说速度很快,但似乎基本的 R 仍然占主导地位?
library(microbenchmark)
A = array(rnorm(10^4), c(100,100))
B = array(rnorm(10^4), c(100,100))

microbenchmark(fastmatrix::hadamard(A,B), A*B, Rfast::mat.mult(A,B), A %*%B, Rfast::Crossprod(A,B), t(A)%*%B)

Unit: microseconds
                       expr     min       lq      mean   median       uq      max neval
 fastmatrix::hadamard(A, B) 174.626 264.2185 303.55052 277.3510 301.7825 1052.063   100
                      A * B   3.711  29.9485  28.81842  31.5185  33.9670   73.038   100
      Rfast::mat.mult(A, B) 760.087 839.6020 918.76155 891.4790 944.5455 1356.274   100
                    A %*% B 500.756 536.6595 602.09226 565.5765 601.5310 1502.310   100
     Rfast::Crossprod(A, B) 716.953 786.3320 854.55307 813.4165 874.3930 1589.097   100
                 t(A) %*% B 520.544 585.2830 666.63137 631.3350 703.5430 1322.987   100
英文:

A couple of simple questions:

  1. I want to build a matrix X of the following type:
N = 1000
A = seq(1, N, 1)
B = A
X = A %*% t(rep(1,N)) - rep(1,N) %*% t(B)

Working with this matrix X is kind of slow: it takes a lot of memory for large N, and doing matrix multiplications for it does not use the linear structure, which I suppose could be helpful(?).

Therefore, I want to avoid this, and the question is: what is the fastest way of calculating X (whenever I need it), possibly without creating the matrices A %*% t(rep(1,N)) and rep(1,N) %*% t(B)?

  1. What's the fastest approach for different sorts of matrix multiplications nowadays? There are some new packages, which are supposedly fast, but it seems that base R still dominates?
library(microbenchmark)
A = array(rnorm(10^4), c(100,100))
B = array(rnorm(10^4), c(100,100))


microbenchmark(fastmatrix::hadamard(A,B), A*B,Rfast::mat.mult(A,B), A %*%B, Rfast::Crossprod(A,B), t(A)%*%B)

Unit: microseconds
                       expr     min       lq      mean   median       uq      max neval
 fastmatrix::hadamard(A, B) 174.626 264.2185 303.55052 277.3510 301.7825 1052.063   100
                      A * B   3.711  29.9485  28.81842  31.5185  33.9670   73.038   100
      Rfast::mat.mult(A, B) 760.087 839.6020 918.76155 891.4790 944.5455 1356.274   100
                    A %*% B 500.756 536.6595 602.09226 565.5765 601.5310 1502.310   100
     Rfast::Crossprod(A, B) 716.953 786.3320 854.55307 813.4165 874.3930 1589.097   100
                 t(A) %*% B 520.544 585.2830 666.63137 631.3350 703.5430 1322.987   100

答案1

得分: 2

以下是翻译好的部分:

To answer your first question:

NN <- c(N, N)
Y <- .row(NN) - .col(NN)

identical(`storage.mode<-`(Y, "double"), X)
## [1] TRUE

Probably not much slower or faster than your code (depending on your BLAS implementation), but certainly more transparent and typically more efficient, giving a result of type integer, not double. The "gotcha" here is that BLAS routines want double, so computing with the result can require a coercion ...

To answer your second question, note that Y' = -Y, hence Y is "skew-symmetric":

identical(t(Y), -Y)
## [1] TRUE

Depending on what you are trying to do, you can take advantage of the fact that (IIRC) skew-symmetric matrices can be factorized as LDL'. Whether there is a package implementing that factorization (using it to solve linear systems, etc.) remains to be seen, but I am doubtful.

If what you are trying to do is just matrix multiplication, then I would not expect skew-symmetry to help at all, unless you are wanting to write specialized Fortran routines of your own, modeled after DSYMM ...

英文:

To answer your first question:

NN &lt;- c(N, N)
Y &lt;- .row(NN) - .col(NN)

identical(`storage.mode&lt;-`(Y, &quot;double&quot;), X)
## [1] TRUE

Probably not much slower or faster than your code (depending on your BLAS implementation), but certainly more transparent and typically more efficient, giving a result of type integer, not double. The "gotcha" here is that BLAS routines want double, so computing with the result can require a coercion ...

To answer your second question, note that Y&#39; = -Y, hence Y is "skew-symmetric":

identical(t(Y), -Y)
## [1] TRUE

Depending on what you are trying to do, you can take advantage of the fact that (IIRC) skew-symmetric matrices can be factorized as LDL&#39;. Whether there is a package implementing that factorization (using it to solve linear systems, etc.) remains to be seen, but I am doubtful.

If what you are trying to do is just matrix multiplication, then I would not expect skew-symmetry to help at all, unless you are wanting to write specialized Fortran routines of your own, modeled after DSYMM ...

huangapple
  • 本文由 发表于 2023年7月13日 23:24:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76681052.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定