英文:
Multiply scipy sparse matrix with a 3d numpy array
问题
我有以下的矩阵
```python
a = sp.random(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
我基本上想要实现以下的公式
我可以像这样计算内部差异
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape # -> (150, 150, 20)
a.shape # -> (150, 150)
我基本上想要在scipy稀疏矩阵和每个内部numpy数组之间广播元素级乘法。
如果A允许是密集的,那么我可以简单地这样做
np.einsum("ij,ijk->k", a.toarray(), (x[:, None, :] - x[None, :, :]) ** 2)
但A是稀疏的,而且可能很大,所以这不是一个选择。当然,我可以重新排序轴并使用for循环在diff数组上循环,但是否有更快的使用numpy的方法呢?
正如@hpaulj指出的,当前解决方案还形成一个形状为(150, 150, 20)的数组,这也会立即导致内存问题,所以这个解决方案也不行。
<details>
<summary>英文:</summary>
I have the following matrices
```python
a = sp.random(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
and I would basically like to implement the following formula
I can calculate the inner difference like this
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape # -> (150, 150, 20)
a.shape # -> (150, 150)
I would basically like to broadcast the element-wise multiplication between the scipy sparse matrix and each internal numpy array.
If A was allowed to be dense, then I could simply do
np.einsum("ij,ijk->k", a.toarray(), (x[:, None, :] - x[None, :, :]) ** 2)
but A is sparse, and potentially huge, so this isn't an option. Of course, I could just reorder the axes and loop over the diff array with a for loop, but is there a faster way using numpy?
As @hpaulj pointed out, the current solution also forms an array of shape (150, 150, 20), which would also immediately lead to problems with memory, so this solution would not be okay either.
答案1
得分: 2
import numpy as np
import scipy.sparse
from numpy.random import default_rng
rand = default_rng(seed=0)
# \sigma_k = \sum_i^N \sum_j^N A_{i,j} (x_{i,k} - x_{j,k})^2
# Dense method
N = 100
x = rand.integers(0, 10, (N, 2))
A = np.clip(rand.integers(0, 100, (N, N)) - 80, a_min=0, a_max=None)
diff = (x[:, None, :] - x[None, :, :])**2
product = np.einsum("ij,ijk->k", A, diff)
# Loop method
s_loop = [0, 0]
for i in range(N):
for j in range(N):
for k in range(2):
s_loop[k] += A[i, j]*(x[i, k] - x[j, k])**2
assert np.allclose(product, s_loop)
# For any i,j, we trivially know whether A_{i,j} is zero, and highly sparse matrices have more zeros
# than nonzeros. Crucially, do not calculate (x_{i,k} - x_{j,k})^2 at all if A_{i,j} is zero.
A_i_nz, A_j_nz = A.nonzero()
diff = (x[A_i_nz, :] - x[A_j_nz, :])**2
s_semidense = A[A_i_nz, A_j_nz].dot(diff)
assert np.allclose(product, s_semidense)
# You can see where this is going:
A_sparse = scipy.sparse.coo_array(A)
diff = (x[A_sparse.row, :] - x[A_sparse.col, :])**2
s_sparse = A_sparse.data.dot(diff)
assert np.allclose(product, s_sparse)
Seems reasonably fast; this completes in about a second:
N = 100_000_000
print(f'Big test: initialising a {N}x{N} array')
n_values = 10_000_000
A = scipy.sparse.coo_array(
(
rand.integers(0, 100, n_values),
rand.integers(0, N, (2, n_values)),
),
shape=(N, N),
)
x = rand.integers(0, 100, (N, 2))
print('Big test: calculating')
s = A.data.dot((x[A.row, :] - x[A.col, :])**2)
print(s)
英文:
import numpy as np
import scipy.sparse
from numpy.random import default_rng
rand = default_rng(seed=0)
# \sigma_k = \sum_i^N \sum_j^N A_{i,j} (x_{i,k} - x_{j,k})^2
# Dense method
N = 100
x = rand.integers(0, 10, (N, 2))
A = np.clip(rand.integers(0, 100, (N, N)) - 80, a_min=0, a_max=None)
diff = (x[:, None, :] - x[None, :, :])**2
product = np.einsum("ij,ijk->k", A, diff)
# Loop method
s_loop = [0, 0]
for i in range(N):
for j in range(N):
for k in range(2):
s_loop[k] += A[i, j]*(x[i, k] - x[j, k])**2
assert np.allclose(product, s_loop)
# For any i,j, we trivially know whether A_{i,j} is zero, and highly sparse matrices have more zeros
# than nonzeros. Crucially, do not calculate (x_{i,k} - x_{j,k})^2 at all if A_{i,j} is zero.
A_i_nz, A_j_nz = A.nonzero()
diff = (x[A_i_nz, :] - x[A_j_nz, :])**2
s_semidense = A[A_i_nz, A_j_nz].dot(diff)
assert np.allclose(product, s_semidense)
# You can see where this is going:
A_sparse = scipy.sparse.coo_array(A)
diff = (x[A_sparse.row, :] - x[A_sparse.col, :])**2
s_sparse = A_sparse.data.dot(diff)
assert np.allclose(product, s_sparse)
Seems reasonably fast; this completes in about a second:
N = 100_000_000
print(f'Big test: initialising a {N}x{N} array')
n_values = 10_000_000
A = scipy.sparse.coo_array(
(
rand.integers(0, 100, n_values),
rand.integers(0, N, (2, n_values)),
),
shape=(N, N),
)
x = rand.integers(0, 100, (N, 2))
print('Big test: calculating')
s = A.data.dot((x[A.row, :] - x[A.col, :])**2)
print(s)
答案2
得分: -2
请尝试以下代码:
a = np.random.rand(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape # -> (150, 150, 20)
a.shape # -> (150, 150)
b = np.einsum("ij,ijk->k", a, (x[:, None, :] - x[None, :, :]) ** 2)
result = b.sum()
对于稀疏矩阵,可以使用以下代码:
import scipy.sparse as sp
import numpy as np
a = sp.random(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape # -> (150, 150, 20)
a.shape # -> (150, 150)
b = np.einsum("ij,ijk->k", a.toarray(), (x[:, None, :] - x[None, :, :]) ** 2)
result = b.sum()
其中a是形状为[150,150]的COO(坐标格式)稀疏矩阵。
英文:
Try the following code:
a = np.random.rand(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape # -> (150, 150, 20)
a.shape # -> (150, 150)
b = np.einsum("ij,ijk->k", a, (x[:, None, :] - x[None, :, :]) ** 2)
result = b.sum()
For sparse matrix the following code can be used:
import scipy.sparse as sp
import numpy as np
a = sp.random(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape # -> (150, 150, 20)
a.shape # -> (150, 150)
b = np.einsum("ij,ijk->k", a.toarray(), (x[:, None, :] - x[None, :, :]) ** 2)
result = b.sum()
where a is sparse matrix in COO (Coordinate Format) of shape [150,150]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论