2023年3月7日 17:18:12go评论114阅读模式

英文:

Multiply scipy sparse matrix with a 3d numpy array

问题

我有以下的矩阵
```python
a = sp.random(150, 150)
x = np.random.normal(0, 1, size=(150, 20))

我基本上想要实现以下的公式

$将scipy稀疏矩阵与一个3D numpy数组相乘。$

我可以像这样计算内部差异

diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape  # -&gt; (150, 150, 20)
a.shape  # -&gt; (150, 150)

我基本上想要在scipy稀疏矩阵和每个内部numpy数组之间广播元素级乘法。

如果A允许是密集的，那么我可以简单地这样做

np.einsum(&quot;ij,ijk-&gt;k&quot;, a.toarray(), (x[:, None, :] - x[None, :, :]) ** 2)

但A是稀疏的，而且可能很大，所以这不是一个选择。当然，我可以重新排序轴并使用for循环在diff数组上循环，但是否有更快的使用numpy的方法呢？

正如@hpaulj指出的，当前解决方案还形成一个形状为(150, 150, 20)的数组，这也会立即导致内存问题，所以这个解决方案也不行。


<details>
<summary>英文:</summary>
I have the following matrices
```python
a = sp.random(150, 150)
x = np.random.normal(0, 1, size=(150, 20))

and I would basically like to implement the following formula

$将scipy稀疏矩阵与一个3D numpy数组相乘。$

I can calculate the inner difference like this

diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape  # -&gt; (150, 150, 20)
a.shape  # -&gt; (150, 150)

I would basically like to broadcast the element-wise multiplication between the scipy sparse matrix and each internal numpy array.

If A was allowed to be dense, then I could simply do

np.einsum(&quot;ij,ijk-&gt;k&quot;, a.toarray(), (x[:, None, :] - x[None, :, :]) ** 2)

but A is sparse, and potentially huge, so this isn't an option. Of course, I could just reorder the axes and loop over the diff array with a for loop, but is there a faster way using numpy?

As @hpaulj pointed out, the current solution also forms an array of shape (150, 150, 20), which would also immediately lead to problems with memory, so this solution would not be okay either.

答案1

得分: 2

import numpy as np
import scipy.sparse
from numpy.random import default_rng
rand = default_rng(seed=0)
# \sigma_k = \sum_i^N \sum_j^N A_{i,j} (x_{i,k} - x_{j,k})^2
# Dense method
N = 100
x = rand.integers(0, 10, (N, 2))
A = np.clip(rand.integers(0, 100, (N, N)) - 80, a_min=0, a_max=None)
diff = (x[:, None, :] - x[None, :, :])**2
product = np.einsum("ij,ijk->k", A, diff)
# Loop method
s_loop = [0, 0]
for i in range(N):
    for j in range(N):
        for k in range(2):
            s_loop[k] += A[i, j]*(x[i, k] - x[j, k])**2
assert np.allclose(product, s_loop)
# For any i,j, we trivially know whether A_{i,j} is zero, and highly sparse matrices have more zeros
# than nonzeros. Crucially, do not calculate (x_{i,k} - x_{j,k})^2 at all if A_{i,j} is zero.
A_i_nz, A_j_nz = A.nonzero()
diff = (x[A_i_nz, :] - x[A_j_nz, :])**2
s_semidense = A[A_i_nz, A_j_nz].dot(diff)
assert np.allclose(product, s_semidense)
# You can see where this is going:
A_sparse = scipy.sparse.coo_array(A)
diff = (x[A_sparse.row, :] - x[A_sparse.col, :])**2
s_sparse = A_sparse.data.dot(diff)
assert np.allclose(product, s_sparse)

Seems reasonably fast; this completes in about a second:

N = 100_000_000
print(f'Big test: initialising a {N}x{N} array')
n_values = 10_000_000
A = scipy.sparse.coo_array(
    (
        rand.integers(0, 100, n_values),
        rand.integers(0, N, (2, n_values)),
    ),
    shape=(N, N),
)
x = rand.integers(0, 100, (N, 2))
print('Big test: calculating')
s = A.data.dot((x[A.row, :] - x[A.col, :])**2)
print(s)

英文:

import numpy as np
import scipy.sparse
from numpy.random import default_rng
rand = default_rng(seed=0)
# \sigma_k = \sum_i^N \sum_j^N A_{i,j} (x_{i,k} - x_{j,k})^2
# Dense method
N = 100
x = rand.integers(0, 10, (N, 2))
A = np.clip(rand.integers(0, 100, (N, N)) - 80, a_min=0, a_max=None)
diff = (x[:, None, :] - x[None, :, :])**2
product = np.einsum(&quot;ij,ijk-&gt;k&quot;, A, diff)
# Loop method
s_loop = [0, 0]
for i in range(N):
    for j in range(N):
        for k in range(2):
            s_loop[k] += A[i, j]*(x[i, k] - x[j, k])**2
assert np.allclose(product, s_loop)
# For any i,j, we trivially know whether A_{i,j} is zero, and highly sparse matrices have more zeros
# than nonzeros. Crucially, do not calculate (x_{i,k} - x_{j,k})^2 at all if A_{i,j} is zero.
A_i_nz, A_j_nz = A.nonzero()
diff = (x[A_i_nz, :] - x[A_j_nz, :])**2
s_semidense = A[A_i_nz, A_j_nz].dot(diff)
assert np.allclose(product, s_semidense)
# You can see where this is going:
A_sparse = scipy.sparse.coo_array(A)
diff = (x[A_sparse.row, :] - x[A_sparse.col, :])**2
s_sparse = A_sparse.data.dot(diff)
assert np.allclose(product, s_sparse)

Seems reasonably fast; this completes in about a second:

N = 100_000_000
print(f&#39;Big test: initialising a {N}x{N} array&#39;)
n_values = 10_000_000
A = scipy.sparse.coo_array(
    (
        rand.integers(0, 100, n_values),
        rand.integers(0, N, (2, n_values)),
    ),
    shape=(N, N),
)
x = rand.integers(0, 100, (N, 2))
print(&#39;Big test: calculating&#39;)
s = A.data.dot((x[A.row, :] - x[A.col, :])**2)
print(s)

答案2

得分: -2

请尝试以下代码：

a = np.random.rand(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape  # -> (150, 150, 20)
a.shape  # -> (150, 150)
b = np.einsum("ij,ijk->k", a, (x[:, None, :] - x[None, :, :]) ** 2)
result = b.sum()

对于稀疏矩阵，可以使用以下代码：

import scipy.sparse as sp
import numpy as np
a = sp.random(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape  # -> (150, 150, 20)
a.shape  # -> (150, 150)
b = np.einsum("ij,ijk->k", a.toarray(), (x[:, None, :] - x[None, :, :]) ** 2)
result = b.sum()

其中a是形状为[150,150]的COO（坐标格式）稀疏矩阵。

英文:

Try the following code:

a = np.random.rand(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape  # -&gt; (150, 150, 20)
a.shape  # -&gt; (150, 150)
b = np.einsum(&quot;ij,ijk-&gt;k&quot;, a, (x[:, None, :] - x[None, :, :]) ** 2)
result = b.sum()

For sparse matrix the following code can be used:

import scipy.sparse as sp
import numpy as np
a = sp.random(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape  # -&gt; (150, 150, 20)
a.shape  # -&gt; (150, 150)
b = np.einsum(&quot;ij,ijk-&gt;k&quot;, a.toarray(), (x[:, None, :] - x[None, :, :]) ** 2)
result = b.sum()

where a is sparse matrix in COO (Coordinate Format) of shape [150,150]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将scipy稀疏矩阵与一个3D numpy数组相乘。

问题

答案1

答案2

applying Singleton to Spotipy throws error

问题：创建数据集的图表时出现问题。

用已有值替换 NaN 值的任何方法

我想使用Python将文件数据转换为3D字典。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。