英文:
Multiply scipy sparse matrix with a 3d numpy array
问题
我有以下的矩阵
```python
a = sp.random(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
我基本上想要实现以下的公式
我可以像这样计算内部差异
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape # -> (150, 150, 20)
a.shape # -> (150, 150)
我基本上想要在scipy稀疏矩阵和每个内部numpy数组之间广播元素级乘法。
如果A允许是密集的,那么我可以简单地这样做
np.einsum("ij,ijk->k", a.toarray(), (x[:, None, :] - x[None, :, :]) ** 2)
但A是稀疏的,而且可能很大,所以这不是一个选择。当然,我可以重新排序轴并使用for循环在diff
数组上循环,但是否有更快的使用numpy的方法呢?
正如@hpaulj指出的,当前解决方案还形成一个形状为(150, 150, 20)
的数组,这也会立即导致内存问题,所以这个解决方案也不行。
<details>
<summary>英文:</summary>
I have the following matrices
```python
a = sp.random(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
and I would basically like to implement the following formula
I can calculate the inner difference like this
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape # -> (150, 150, 20)
a.shape # -> (150, 150)
I would basically like to broadcast the element-wise multiplication between the scipy sparse matrix and each internal numpy array.
If A was allowed to be dense, then I could simply do
np.einsum("ij,ijk->k", a.toarray(), (x[:, None, :] - x[None, :, :]) ** 2)
but A is sparse, and potentially huge, so this isn't an option. Of course, I could just reorder the axes and loop over the diff
array with a for loop, but is there a faster way using numpy?
As @hpaulj pointed out, the current solution also forms an array of shape (150, 150, 20)
, which would also immediately lead to problems with memory, so this solution would not be okay either.
答案1
得分: 2
import numpy as np
import scipy.sparse
from numpy.random import default_rng
rand = default_rng(seed=0)
# \sigma_k = \sum_i^N \sum_j^N A_{i,j} (x_{i,k} - x_{j,k})^2
# Dense method
N = 100
x = rand.integers(0, 10, (N, 2))
A = np.clip(rand.integers(0, 100, (N, N)) - 80, a_min=0, a_max=None)
diff = (x[:, None, :] - x[None, :, :])**2
product = np.einsum("ij,ijk->k", A, diff)
# Loop method
s_loop = [0, 0]
for i in range(N):
for j in range(N):
for k in range(2):
s_loop[k] += A[i, j]*(x[i, k] - x[j, k])**2
assert np.allclose(product, s_loop)
# For any i,j, we trivially know whether A_{i,j} is zero, and highly sparse matrices have more zeros
# than nonzeros. Crucially, do not calculate (x_{i,k} - x_{j,k})^2 at all if A_{i,j} is zero.
A_i_nz, A_j_nz = A.nonzero()
diff = (x[A_i_nz, :] - x[A_j_nz, :])**2
s_semidense = A[A_i_nz, A_j_nz].dot(diff)
assert np.allclose(product, s_semidense)
# You can see where this is going:
A_sparse = scipy.sparse.coo_array(A)
diff = (x[A_sparse.row, :] - x[A_sparse.col, :])**2
s_sparse = A_sparse.data.dot(diff)
assert np.allclose(product, s_sparse)
Seems reasonably fast; this completes in about a second:
N = 100_000_000
print(f'Big test: initialising a {N}x{N} array')
n_values = 10_000_000
A = scipy.sparse.coo_array(
(
rand.integers(0, 100, n_values),
rand.integers(0, N, (2, n_values)),
),
shape=(N, N),
)
x = rand.integers(0, 100, (N, 2))
print('Big test: calculating')
s = A.data.dot((x[A.row, :] - x[A.col, :])**2)
print(s)
英文:
import numpy as np
import scipy.sparse
from numpy.random import default_rng
rand = default_rng(seed=0)
# \sigma_k = \sum_i^N \sum_j^N A_{i,j} (x_{i,k} - x_{j,k})^2
# Dense method
N = 100
x = rand.integers(0, 10, (N, 2))
A = np.clip(rand.integers(0, 100, (N, N)) - 80, a_min=0, a_max=None)
diff = (x[:, None, :] - x[None, :, :])**2
product = np.einsum("ij,ijk->k", A, diff)
# Loop method
s_loop = [0, 0]
for i in range(N):
for j in range(N):
for k in range(2):
s_loop[k] += A[i, j]*(x[i, k] - x[j, k])**2
assert np.allclose(product, s_loop)
# For any i,j, we trivially know whether A_{i,j} is zero, and highly sparse matrices have more zeros
# than nonzeros. Crucially, do not calculate (x_{i,k} - x_{j,k})^2 at all if A_{i,j} is zero.
A_i_nz, A_j_nz = A.nonzero()
diff = (x[A_i_nz, :] - x[A_j_nz, :])**2
s_semidense = A[A_i_nz, A_j_nz].dot(diff)
assert np.allclose(product, s_semidense)
# You can see where this is going:
A_sparse = scipy.sparse.coo_array(A)
diff = (x[A_sparse.row, :] - x[A_sparse.col, :])**2
s_sparse = A_sparse.data.dot(diff)
assert np.allclose(product, s_sparse)
Seems reasonably fast; this completes in about a second:
N = 100_000_000
print(f'Big test: initialising a {N}x{N} array')
n_values = 10_000_000
A = scipy.sparse.coo_array(
(
rand.integers(0, 100, n_values),
rand.integers(0, N, (2, n_values)),
),
shape=(N, N),
)
x = rand.integers(0, 100, (N, 2))
print('Big test: calculating')
s = A.data.dot((x[A.row, :] - x[A.col, :])**2)
print(s)
答案2
得分: -2
请尝试以下代码:
a = np.random.rand(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape # -> (150, 150, 20)
a.shape # -> (150, 150)
b = np.einsum("ij,ijk->k", a, (x[:, None, :] - x[None, :, :]) ** 2)
result = b.sum()
对于稀疏矩阵,可以使用以下代码:
import scipy.sparse as sp
import numpy as np
a = sp.random(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape # -> (150, 150, 20)
a.shape # -> (150, 150)
b = np.einsum("ij,ijk->k", a.toarray(), (x[:, None, :] - x[None, :, :]) ** 2)
result = b.sum()
其中a是形状为[150,150]的COO(坐标格式)稀疏矩阵。
英文:
Try the following code:
a = np.random.rand(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape # -> (150, 150, 20)
a.shape # -> (150, 150)
b = np.einsum("ij,ijk->k", a, (x[:, None, :] - x[None, :, :]) ** 2)
result = b.sum()
For sparse matrix the following code can be used:
import scipy.sparse as sp
import numpy as np
a = sp.random(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape # -> (150, 150, 20)
a.shape # -> (150, 150)
b = np.einsum("ij,ijk->k", a.toarray(), (x[:, None, :] - x[None, :, :]) ** 2)
result = b.sum()
where a is sparse matrix in COO (Coordinate Format) of shape [150,150]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论