2023年7月13日 23:24:47go评论177阅读模式

英文:

Fast matrix operations in R

问题

我想要构建一个类型如下的矩阵 X：

N = 1000
A = seq(1, N, 1)
B = A
X = A %*% t(rep(1,N)) - rep(1,N) %*% t(B)

处理这个矩阵 X 速度有点慢：对于较大的 N，需要大量内存，并且对其进行矩阵乘法操作似乎没有充分利用线性结构，这可能会有所帮助(?)。

因此，我想要避免这种情况，问题是：在需要计算 X 时，最快的计算方法是什么，可能不需要创建矩阵 A %*% t(rep(1,N)) 和 rep(1,N) %*% t(B)？

目前，不同类型的矩阵乘法的最快方法是什么？有一些新的包，据说速度很快，但似乎基本的 R 仍然占主导地位？

library(microbenchmark)
A = array(rnorm(10^4), c(100,100))
B = array(rnorm(10^4), c(100,100))
microbenchmark(fastmatrix::hadamard(A,B), A*B, Rfast::mat.mult(A,B), A %*%B, Rfast::Crossprod(A,B), t(A)%*%B)
Unit: microseconds
                       expr     min       lq      mean   median       uq      max neval
 fastmatrix::hadamard(A, B) 174.626 264.2185 303.55052 277.3510 301.7825 1052.063   100
                      A * B   3.711  29.9485  28.81842  31.5185  33.9670   73.038   100
      Rfast::mat.mult(A, B) 760.087 839.6020 918.76155 891.4790 944.5455 1356.274   100
                    A %*% B 500.756 536.6595 602.09226 565.5765 601.5310 1502.310   100
     Rfast::Crossprod(A, B) 716.953 786.3320 854.55307 813.4165 874.3930 1589.097   100
                 t(A) %*% B 520.544 585.2830 666.63137 631.3350 703.5430 1322.987   100

英文:

A couple of simple questions:

I want to build a matrix X of the following type:

N = 1000
A = seq(1, N, 1)
B = A
X = A %*% t(rep(1,N)) - rep(1,N) %*% t(B)

Working with this matrix X is kind of slow: it takes a lot of memory for large N, and doing matrix multiplications for it does not use the linear structure, which I suppose could be helpful(?).

Therefore, I want to avoid this, and the question is: what is the fastest way of calculating X (whenever I need it), possibly without creating the matrices A %*% t(rep(1,N)) and rep(1,N) %*% t(B)?

What's the fastest approach for different sorts of matrix multiplications nowadays? There are some new packages, which are supposedly fast, but it seems that base R still dominates?

library(microbenchmark)
A = array(rnorm(10^4), c(100,100))
B = array(rnorm(10^4), c(100,100))
microbenchmark(fastmatrix::hadamard(A,B), A*B,Rfast::mat.mult(A,B), A %*%B, Rfast::Crossprod(A,B), t(A)%*%B)
Unit: microseconds
                       expr     min       lq      mean   median       uq      max neval
 fastmatrix::hadamard(A, B) 174.626 264.2185 303.55052 277.3510 301.7825 1052.063   100
                      A * B   3.711  29.9485  28.81842  31.5185  33.9670   73.038   100
      Rfast::mat.mult(A, B) 760.087 839.6020 918.76155 891.4790 944.5455 1356.274   100
                    A %*% B 500.756 536.6595 602.09226 565.5765 601.5310 1502.310   100
     Rfast::Crossprod(A, B) 716.953 786.3320 854.55307 813.4165 874.3930 1589.097   100
                 t(A) %*% B 520.544 585.2830 666.63137 631.3350 703.5430 1322.987   100

答案1

得分: 2

以下是翻译好的部分：

To answer your first question:

NN <- c(N, N)
Y <- .row(NN) - .col(NN)
identical(`storage.mode<-`(Y, "double"), X)
## [1] TRUE

Probably not much slower or faster than your code (depending on your BLAS implementation), but certainly more transparent and typically more efficient, giving a result of type integer, not double. The "gotcha" here is that BLAS routines want double, so computing with the result can require a coercion ...

To answer your second question, note that Y' = -Y, hence Y is "skew-symmetric":

identical(t(Y), -Y)
## [1] TRUE

Depending on what you are trying to do, you can take advantage of the fact that (IIRC) skew-symmetric matrices can be factorized as LDL'. Whether there is a package implementing that factorization (using it to solve linear systems, etc.) remains to be seen, but I am doubtful.

If what you are trying to do is just matrix multiplication, then I would not expect skew-symmetry to help at all, unless you are wanting to write specialized Fortran routines of your own, modeled after DSYMM ...

英文:

To answer your first question:

NN &lt;- c(N, N)
Y &lt;- .row(NN) - .col(NN)
identical(`storage.mode&lt;-`(Y, &quot;double&quot;), X)
## [1] TRUE

To answer your second question, note that Y' = -Y, hence Y is "skew-symmetric":

identical(t(Y), -Y)
## [1] TRUE

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

快速矩阵操作在 R 中

问题

答案1

在使用正则表达式匹配较长字符串时出现StackOverflowError – 优化

基于另一个向量（矢量化）展开矢量位置。

从分组数据中使用分段回归提取多个变量的断点。

如何将数据框列表合并为一个数据框使用R？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。