英文:
Equivalent to C++ lapack dgesvd in Python
问题
我目前正在努力将一些论文中的C++代码转换为Python(我对Python更熟悉)。到目前为止,我已经确保了代码中所有的值匹配(我设法从GitHub获取了C++代码库并在Visual Studio中进行调试,以与Python中的输出进行比较)。
然而,我似乎在C++代码中对lapack中的dgesvd调用方面遇到了一些困难,因为我不太清楚它在幕后是如何处理的,也不知道如何在Python中模仿这种行为。
C++代码中调用lapack的dgesvd如下:
// 计算K0的SVD
char save[]="S", nosave[]="N";
int nosavedim=1;
int info, lwork;
int cols_s = 316*dofn3_s;
lwork = 5*n3; // 更改
work = (double *) malloc(lwork * sizeof(double));
int U0size;
Sigma_size = dofn3_f;
U0size = dofn3_f * dofn3_f;
Sigma = (double *)malloc(Sigma_size * sizeof(double));
U0 = (double *)malloc(U0size * sizeof(double));
VT0 = NULL;
dgesvd_(save, nosave, &dofn3_f, &cols_s, K0, &dofn3_f, Sigma, U0,
&dofn3_f, VT0, &nosavedim, work, &lwork, &info);
因此,从C++传递给dgesvd调用的变量是:
变量 | 值 |
---|---|
save | "S" |
nosave | "N" |
dofn3_f | 27 |
cols_s | 8532 |
K0 | 形状为 (230364,) 的一维数组 |
dofn3_f | 27 |
Sigma | SVD 返回值 |
U0 | SVD 返回值 |
dofn3_f | 27 |
VT0 | NULL(不返回SVD结果) |
nosavedim | 1 |
work | 1809 |
lwork | 135 |
info | SVD 返回值 |
C++代码返回:
变量 | 值 |
---|---|
U0 | 形状为 (729,) 的一维数组 |
Sigma | 形状为 (27,) 的一维数组 |
从我从 lapack文档 中了解到的,这基本上意味着它要求dgesvd返回U(左奇异向量)的前min(m,n)列,并将其存储在数组U0中,并且不返回VT0(这与返回结果相匹配)。
然而,我不太明白的是,输入数组是一维的,但输入矩阵应该是一个MxN矩阵。我假设lapack函数在幕后将其转换了(或者根据输入以正确的顺序进行了操作)?
查看 Scipy对lapack dgesvd的低级包装器 和 Scipy的SVD 并测试两者,它们都要求输入数组的形状为(m, n)。
我尝试过以下方式调用低级的Scipy包装器:
l_work = np.int32(5 * self.n3)
U0, Sigma, V0, info = linalg.lapack.dgesvd(k0, full_matrices=0, lwork=l_work)
但返回一个错误 ** 在调用DGESVD时,参数13的值非法
。
如果我移除 lwork
关键字参数,它不会出错,但返回:
变量 | 值 |
---|---|
U0 | 形状为 (23064,) 的一维数组 |
Sigma | 形状为 (1,) 的一维数组 |
我尝试将输入数组重新塑形为形状为 (27, 8532) 的数组,使用 np.reshape((27, 8532))
并同时使用低级包装器和标准的Scipy实现,这返回:
变量 | 值 |
---|---|
U0 | 形状为 (27,27) 的二维数组 |
Sigma | 形状为 (27,) 的一维数组 |
如果我重新塑形U0数组,它会给我与C++从lapack返回的相同形状,但我的值与C++结果中的值完全不同。
这只是简单地需要以不同的顺序重新塑形数组吗?如果是这样,我需要如何做才能确保从SVD计算中获得与C++代码相同的结果?
英文:
I'm currently working my way through converting some C++ code from a paper to Python (a language I'm far more familiar with). I've gotten all the values to this point in the code to match (I managed to get the C++ repo from github to compile and can debug in Visual Studio to compare to my outputs in Python).
I seem to have hit a bit of a stumbling block with the way the dgesvd call is being made to lapack in the C++ code, as I'm not too sure how it's being handled behind the scenes and how I can mimic the behaviour in Python.
The C++ code calls the lapack dgesvd as follows:
// Compute the SVD of K0
char save[]="S", nosave[]="N";
int nosavedim=1;
int info, lwork;
int cols_s = 316*dofn3_s;
lwork = 5*n3; // Change
work = (double *) malloc(lwork * sizeof(double));
int U0size;
Sigma_size = dofn3_f;
U0size = dofn3_f * dofn3_f;
Sigma = (double *)malloc(Sigma_size * sizeof(double));
U0 = (double *)malloc(U0size * sizeof(double));
VT0 = NULL;
dgesvd_(save, nosave, &dofn3_f, &cols_s, K0, &dofn3_f, Sigma, U0,
&dofn3_f, VT0, &nosavedim, work, &lwork, &info);
So, the variables going into the dgesvd call from C++ are:
Variable | Value |
---|---|
save | "S" |
nosave | "N" |
dofn3_f | 27 |
cols_s | 8532 |
K0 | 1D array of shape (230364,) |
dofn3_f | 27 |
Sigma | Return from SVD |
U0 | Return from SVD |
dofn3_f | 27 |
VT0 | NULL (don't return from SVD) |
nosavedim | 1 |
work | 1809 |
lwork | 135 |
info | Return from SVD |
The C++ code gets back:
Variable | Value |
---|---|
U0 | 1D array of shape (729,) |
Sigma | 1D array of shape (27,) |
From what I can tell from the lapack doco this essentially means it's asking dgesvd for the first min(m,n) columns of U (the left singular vectors) to be returned in the array U0 and no output for VT0 (which matches what's returned).
What I don't quite understand, though, is that the input array is 1D, yet the input matrix is supposed to be an M-by-N matrix. I'm assuming that the lapack function is converting it behind the scenes (or operating on it in the correct order based on the inputs)?
Looking at the low level Scipy wrapper for the lapack dgesvd and the Scipy svd itself and testing both, they both require the input array to be of shape (m, n).
I've tried calling the low level Scipy wrapper as follows:
l_work = np.int32(5 * self.n3)
U0, Sigma, V0, info = linalg.lapack.dgesvd(k0, full_matrices=0, lwork=l_work)
Which returns an error ** On entry to DGESVD parameter number 13 had an illegal value
.
If I remove the lwork
kwarg it doesn't error, but returns:
Variable | Value |
---|---|
U0 | 1D array of shape (23064,) |
Sigma | 1D array of shape (1,) |
I've tried reshaping the input array to be of shape (27, 8532) by using np.reshape((27, 8532))
and using both the low level wrapper and the standard Scipy implementation, this returns:
Variable | Value |
---|---|
U0 | 1D array of shape (27,27) |
Sigma | 1D array of shape (27,) |
Which, if I reshape the U0 array, gives me the same shapes as the C++ gets back from lapack, but my values are all quite different to those from the C++ results.
Is this simply a matter of reshaping the arrays in a different order somehow? And if so, how would I need to do this to ensure I get the same values back from the SVD calc as the C++ code?
EDIT
To provide some further context, if I run:
A = K0.reshape((27, 8532))
U0, Sigma, V0, info = scipy.linalg.svd(A)
Here's a screenshot of a comparison of the values I'm seeing from the C++ code and my Python version. You can see the input array K0 matches, but the values returned from the SVD calcs are different. The K0 array being parsed into the C++ lapack call is a 1D array, so it's obviously converting it to a 2D array of shape (m, n) somehow behind the scenes - this is what I need to be able to mimic in Python to ensure I get the same answer out of the SVD.
答案1
得分: 1
你可以考虑使用scipy.linalg.svd
而不是调用底层的LAPACK例程。
你粘贴的代码似乎只关心SVD中的U
和S
,所以要获得你想要的结果可能就像这样简单:
U, S, _ = scipy.linalg.svd(K0)
使用SciPy会为你处理所有底层内存形状。如果K0
是(M,N),那么U
将是(M,M),在你的情况下应该是(27,27),正好有729个元素,就像C代码一样。
通常情况下,从Python调用底层的LAPACK和BLAS代码通常是不必要的,因为scipy
已经为所有常见的例程提供了包装器。
英文:
You should consider using scipy.linalg.svd
instead of calling the underlying LAPACK routine.
The code you pasted seems to only care about U
and S
from the svd so getting the result you want might be as simple as:
U, S, _ = scipy.linalg.svd(K0)
with SciPy handling all the underlying memory shapes for you. Here if K0
is (M,N) then U
will be (M,M) which in your case should be (27,27) and have exactly 729 elements just like the C code.
Calling the underlying LAPACK and BLAS code from python is often not needed as scipy
has made wrappers for all the common routines.
答案2
得分: 0
对的,我今天深入研究了一下,但我想我终于弄清楚了。
如果未来有人搜索这个问题,原来在C++代码中传递给lapack dgesvd函数的数组是列优先的,所以为了在Python中获得相同的值,我需要使用以下方法将1D列优先数组转换为NumPy数组以在scipy.linalg.svd
中使用,然后展平并反转以获得与C++代码相同的结果:
# 将其转换为NumPy 2D数组(使用数组构造函数,使用order='F'以表示列优先)
A = np.array(k0, order='F').reshape(cols_s, dofn3_f).T
u, s, vt = linalg.svd(A, full_matrices=True)
u = u.flatten('F')
s = s.flatten('F')
u0 = u[::-1]
s0 = s[::-1]
英文:
Right, so I've been down a bit of a rabbit hole today, but think I've finally got this figured out.
If anyone comes searching for this in the future, turns out the array being parsed to the lapack dgesvd function in the C++ code is column-major, so to get the same values in Python I needed to use the following to convert the 1D column-major array to a numpy array to use in scipy.linalg.svd
, then flatten and reverse to convert back to give the same result as the C++ code:
# Convert to a NumPy 2D array (using array constructor with order='F' for column-major)
A = np.array(k0, order='F').reshape(cols_s, dofn3_f).T
u, s, vt = linalg.svd(A, full_matrices=True)
u = u.flatten('F')
s = s.flatten('F')
u0 = u[::-1]
s0 = s[::-1]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论