将scipy稀疏矩阵与一个3D numpy数组相乘。

huangapple go评论81阅读模式
英文:

Multiply scipy sparse matrix with a 3d numpy array

问题

我有以下的矩阵

```python
a = sp.random(150, 150)
x = np.random.normal(0, 1, size=(150, 20))

我基本上想要实现以下的公式

将scipy稀疏矩阵与一个3D numpy数组相乘。

我可以像这样计算内部差异

diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape  # -> (150, 150, 20)

a.shape  # -> (150, 150)

我基本上想要在scipy稀疏矩阵和每个内部numpy数组之间广播元素级乘法。

如果A允许是密集的,那么我可以简单地这样做

np.einsum("ij,ijk->k", a.toarray(), (x[:, None, :] - x[None, :, :]) ** 2)

但A是稀疏的,而且可能很大,所以这不是一个选择。当然,我可以重新排序轴并使用for循环在diff数组上循环,但是否有更快的使用numpy的方法呢?

正如@hpaulj指出的,当前解决方案还形成一个形状为(150, 150, 20)的数组,这也会立即导致内存问题,所以这个解决方案也不行。


<details>
<summary>英文:</summary>

I have the following matrices

```python
a = sp.random(150, 150)
x = np.random.normal(0, 1, size=(150, 20))

and I would basically like to implement the following formula

将scipy稀疏矩阵与一个3D numpy数组相乘。

I can calculate the inner difference like this

diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape  # -&gt; (150, 150, 20)

a.shape  # -&gt; (150, 150)

I would basically like to broadcast the element-wise multiplication between the scipy sparse matrix and each internal numpy array.

If A was allowed to be dense, then I could simply do

np.einsum(&quot;ij,ijk-&gt;k&quot;, a.toarray(), (x[:, None, :] - x[None, :, :]) ** 2)

but A is sparse, and potentially huge, so this isn't an option. Of course, I could just reorder the axes and loop over the diff array with a for loop, but is there a faster way using numpy?

As @hpaulj pointed out, the current solution also forms an array of shape (150, 150, 20), which would also immediately lead to problems with memory, so this solution would not be okay either.

答案1

得分: 2

import numpy as np
import scipy.sparse
from numpy.random import default_rng

rand = default_rng(seed=0)

# \sigma_k = \sum_i^N \sum_j^N A_{i,j} (x_{i,k} - x_{j,k})^2

# Dense method
N = 100
x = rand.integers(0, 10, (N, 2))
A = np.clip(rand.integers(0, 100, (N, N)) - 80, a_min=0, a_max=None)
diff = (x[:, None, :] - x[None, :, :])**2
product = np.einsum("ij,ijk->k", A, diff)

# Loop method
s_loop = [0, 0]
for i in range(N):
    for j in range(N):
        for k in range(2):
            s_loop[k] += A[i, j]*(x[i, k] - x[j, k])**2
assert np.allclose(product, s_loop)

# For any i,j, we trivially know whether A_{i,j} is zero, and highly sparse matrices have more zeros
# than nonzeros. Crucially, do not calculate (x_{i,k} - x_{j,k})^2 at all if A_{i,j} is zero.
A_i_nz, A_j_nz = A.nonzero()
diff = (x[A_i_nz, :] - x[A_j_nz, :])**2
s_semidense = A[A_i_nz, A_j_nz].dot(diff)
assert np.allclose(product, s_semidense)

# You can see where this is going:
A_sparse = scipy.sparse.coo_array(A)
diff = (x[A_sparse.row, :] - x[A_sparse.col, :])**2
s_sparse = A_sparse.data.dot(diff)
assert np.allclose(product, s_sparse)

Seems reasonably fast; this completes in about a second:

N = 100_000_000
print(f'Big test: initialising a {N}x{N} array')
n_values = 10_000_000
A = scipy.sparse.coo_array(
    (
        rand.integers(0, 100, n_values),
        rand.integers(0, N, (2, n_values)),
    ),
    shape=(N, N),
)
x = rand.integers(0, 100, (N, 2))

print('Big test: calculating')
s = A.data.dot((x[A.row, :] - x[A.col, :])**2)

print(s)
英文:
import numpy as np
import scipy.sparse
from numpy.random import default_rng

rand = default_rng(seed=0)

# \sigma_k = \sum_i^N \sum_j^N A_{i,j} (x_{i,k} - x_{j,k})^2

# Dense method
N = 100
x = rand.integers(0, 10, (N, 2))
A = np.clip(rand.integers(0, 100, (N, N)) - 80, a_min=0, a_max=None)
diff = (x[:, None, :] - x[None, :, :])**2
product = np.einsum(&quot;ij,ijk-&gt;k&quot;, A, diff)

# Loop method
s_loop = [0, 0]
for i in range(N):
    for j in range(N):
        for k in range(2):
            s_loop[k] += A[i, j]*(x[i, k] - x[j, k])**2
assert np.allclose(product, s_loop)

# For any i,j, we trivially know whether A_{i,j} is zero, and highly sparse matrices have more zeros
# than nonzeros. Crucially, do not calculate (x_{i,k} - x_{j,k})^2 at all if A_{i,j} is zero.
A_i_nz, A_j_nz = A.nonzero()
diff = (x[A_i_nz, :] - x[A_j_nz, :])**2
s_semidense = A[A_i_nz, A_j_nz].dot(diff)
assert np.allclose(product, s_semidense)

# You can see where this is going:
A_sparse = scipy.sparse.coo_array(A)
diff = (x[A_sparse.row, :] - x[A_sparse.col, :])**2
s_sparse = A_sparse.data.dot(diff)
assert np.allclose(product, s_sparse)

Seems reasonably fast; this completes in about a second:

N = 100_000_000
print(f&#39;Big test: initialising a {N}x{N} array&#39;)
n_values = 10_000_000
A = scipy.sparse.coo_array(
    (
        rand.integers(0, 100, n_values),
        rand.integers(0, N, (2, n_values)),
    ),
    shape=(N, N),
)
x = rand.integers(0, 100, (N, 2))

print(&#39;Big test: calculating&#39;)
s = A.data.dot((x[A.row, :] - x[A.col, :])**2)

print(s)

答案2

得分: -2

请尝试以下代码:

a = np.random.rand(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape  # -> (150, 150, 20)

a.shape  # -> (150, 150)
b = np.einsum("ij,ijk->k", a, (x[:, None, :] - x[None, :, :]) ** 2)
result = b.sum()

对于稀疏矩阵,可以使用以下代码:

import scipy.sparse as sp
import numpy as np
a = sp.random(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape  # -> (150, 150, 20)

a.shape  # -> (150, 150)
b = np.einsum("ij,ijk->k", a.toarray(), (x[:, None, :] - x[None, :, :]) ** 2)
result = b.sum()

其中a是形状为[150,150]的COO(坐标格式)稀疏矩阵。

英文:

Try the following code:

a = np.random.rand(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape  # -&gt; (150, 150, 20)

a.shape  # -&gt; (150, 150)
b = np.einsum(&quot;ij,ijk-&gt;k&quot;, a, (x[:, None, :] - x[None, :, :]) ** 2)
result = b.sum()

For sparse matrix the following code can be used:

import scipy.sparse as sp
import numpy as np
a = sp.random(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape  # -&gt; (150, 150, 20)

a.shape  # -&gt; (150, 150)
b = np.einsum(&quot;ij,ijk-&gt;k&quot;, a.toarray(), (x[:, None, :] - x[None, :, :]) ** 2)
result = b.sum()

where a is sparse matrix in COO (Coordinate Format) of shape [150,150]

huangapple
  • 本文由 发表于 2023年3月7日 17:18:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/75660016.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定