英文:
Efficient Method to pass large arrays from Matlab to C++ Eigen using Map class
问题
我正在尝试从Matlab传递大型复杂矩阵到一个使用Eigen库的C++ Mex API函数。我提出了两种使用Eigen的Map类来将数据从输入的TypedArray复杂双精度矩阵映射到Eigen复杂双精度矩阵的方法。我相信使用map类不应该产生任何复制操作,因此执行时间应该是微不足道的,无论矩阵的大小如何。
实现方法1:
matlab::data::TypedArray<std::complex<double>> inIDXT = std::move(inputs[1]);
Eigen::MatrixXcd idxt = Eigen::Map< Eigen::MatrixXcd > (inIDXT.release().get(), numEl*na, nPoints);
实现方法2:
matlab::data::TypedArray<std::complex<double>> inIDXT = std::move(inputs[1]);
auto idxtData = inIDXT.release();
Eigen::Map< Eigen::MatrixXcd > idxt(idxtData.get(), numEl*na, nPoints);
我使用std::chrono::steady_clock::now()
来测量执行时间,并且我在Matlab中使用MinGW64编译器的mex命令编译和运行这段代码,我认为它等效于gcc 6.3(不幸的是,不支持更新的版本)。
我发现,对于大小为14400x73728的复杂双精度输入矩阵,实现方法2需要大约30到40秒,而实现方法1需要大约150秒。我知道这不是分析代码的最佳方法,但我仍然对任何方法需要花费时间感到惊讶,更不用说这两种实现之间存在较大差异了。
这里发生了什么?如果真的没有复制操作,为什么这个操作会消耗时间?我是否忽略了一些微妙之处?或者我在这里太天真,这么大的矩阵构建一个map对象至少需要一些时间?此外,为什么两个版本之间的运行时间有差异?据我所知,它们应该是相同的。
英文:
I am trying to pass large, complex matrices from Matlab to a C++ Mex API function that uses the Eigen library. I've come up with two approaches using Eigen's Map class to map data from an input TypedArray Complex double matrix to an Eigen complex double matrix. I believe that using the map class shouldn't yield any copy operations and therefore the execution time should be trivial, regardless of the size of the matrix.
Implementation 1:
matlab::data::TypedArray<std::complex<double>> inIDXT = std::move(inputs[1]);
Eigen::MatrixXcd idxt = Eigen::Map< Eigen::MatrixXcd > (inIDXT.release().get(), numEl*na, nPoints);
Implementation 2:
matlab::data::TypedArray<std::complex<double>> inIDXT = std::move(inputs[1]);
auto idxtData = inIDXT.release();
Eigen::Map< Eigen::MatrixXcd > idxt(idxtData.get(), numEl*na, nPoints);
I measured the execution time using std::chrono::steady_clock::now()
and I compiled and ran this in matlab using the mex command with MinGW64 Compiler which I believe is equivalent to gcc 6.3 (Unfortunately, more recent versions are not supported).
I found that for an input matrix of size 14400x73728 complex double, Implementation 2 takes ~30~40 seconds whereas Implementation 1 takes ~150 seconds. I know this is not the best way to profile code, but still, I am surprised that either approach takes any amount of time at all, let alone has a major difference between the two implementations.
What is going on here? If there is genuinely no copy operation going on, why does this operation consume time? Am I missing some subtlety? Or am I being naive here and matrices this large will take at least some time to construct a map object for? Also, why is there a difference in the runtime between the two versions? As far as I can tell, they should be the same.
答案1
得分: 4
这里存在一些问题和误解:
请注意 release
方法的文档:
> 释放数组中的底层缓冲区。如果数组是共享的,将复制缓冲区;否则不会复制。在释放缓冲区后,数组不包含任何元素。
其返回值是 buffer_ptr_t<T>
,这是一个 unique_ptr
。由此产生了三个后果:
- 第二行代码的成本可能会受到复制的影响,如果它是共享的。
- 第一行代码有点危险,因为你没有保留
unique_ptr
。不过,你的Map
在语句结束时会被销毁,所以我认为这种用法是可以的。 - 在第二行代码中,你保留了
unique_ptr
,但要小心:它必须存活的时间至少和Map
一样长,否则Map
将引用已删除的数据(悬空指针)。
第二点是,Eigen::MatrixXcd idxt = Eigen::Map
创建了一个副本。Eigen::Matrix
拥有其数据。这是从映射类型初始化矩阵,意味着进行了复制。这解释了额外的成本。
英文:
There are some issues and misconceptions here:
Note the documentation of release
:
> Release the underlying buffer from the Array. If the Array is shared, a copy of the buffer is made; otherwise, no copy is made. After the buffer is released, the array contains no elements
And its return value is buffer_ptr_t<T>
which is a unique_ptr
. Three consequences of that:
- The cost of the second can be from the copy if it was shared
- The first code is a bit dangerous because you don't retain the
unique_ptr
. However, yourMap
gets destroyed at the end of the statement so I think that use is okay - In the second code you retain the
unique_ptr
but be careful: It has to live as long or longer than theMap
, otherwise theMap
will reference deleted data (a dangling pointer)
The second point is that Eigen::MatrixXcd idxt = Eigen::Map
creates a copy. An Eigen::Matrix
owns its data. This initializes a matrix from a map type, meaning a copy. That explains the extra cost.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论