2023年6月22日 16:57:36go评论55阅读模式

英文:

SciPy equivalent of np.minimum.outer

问题

我需要对两个（大）稀疏向量执行外积，而不是乘法，取值的最小值。我需要一种高效的方法来执行此操作。
内置的逐点乘法需要30毫秒，而我的（朴素）代码需要5分钟。

%%timeit -r 1
# c1, c2是稀疏向量的 .nonzero 部分
minl = lil_matrix((c1.shape[1], c2.shape[1]))
for i1 in c1.nonzero()[1]:
    for i2 in c2.nonzero()[1]:
        minl[i1, i2] = min(c1[0, i1], c2[0, i2])

英文:

I need to do the outer product of two (large) sparse vectors, taking a minimum of the values instead of multiplication. I need an efficient way to perform this.
The build-in point-wise multiplication takes 30ms, and my (naive) code takes 5 minutes

%%timeit -r 1
# c1, c2 are .nonzero of sparse vector
minl = lil_matrix((c1.shape[1], c2.shape[1]))
for i1 in c1.nonzero()[1]:
    for i2 in c2.nonzero()[1]:
        minl[i1, i2] = min(c1[0, i1], c2[0, i2])

答案1

得分: 1

也许我应该坚持使用[mcve]，但让我们猜测一个合理的样本：

from scipy import sparse    
c1 = sparse.random(1, 100, 0.2, 'csr')    
c2 = sparse.random(1, 100, 0.2, 'csr')

def foo(c1, c2):
    minl = sparse.lil_matrix((c1.shape[1], c2.shape[1]))
    for i1 in c1.nonzero()[1]:
        for i2 in c2.nonzero()[1]:
            minl[i1, i2] = min(c1[0, i1], c2[0, i2])
    return minl

LL1 = foo(c1, c2)

LL1
# <100x100 sparse matrix of type '<class 'numpy.float64'>'
# 	with 400 stored elements in List of Lists format>

对于密集数组，匹配数组长度的简单逐元素最小值为：

np.minimum(c1.A, c2.A).shape
# (1, 100)

并且通过广播，我们可以进行外部最小值计算：

np.minimum(c1.T.A, c2.A).shape
# (100, 100)

这与你的迭代答案匹配：

np.allclose(np.minimum(c1.T.A, c2.A), LL1.A)
# True

比较计算时间：

timeit LL1 = foo(c1, c2)
# 37.8 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

timeit np.minimum(c1.T.A, c2.A)
# 257 µs ± 3.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

但稀疏矩阵不支持广播。

我建议在循环中构建一个coo矩阵。

扩展稀疏矩阵：

我可以通过乘以适当的稀疏矩阵来扩展矩阵：

(c1.T * sparse.csr_matrix(np.ones((1, 100)))).minimum(sparse.csr_matrix(np.ones((100, 1)) * c2))

这将在以下时间内运行：

1.91 ms ± 15.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

并且可以打包成一个函数：

def foo1(c1, c2):
    On1 = sparse.csr_matrix(np.ones((1, c2.shape[1])))
    On2 = sparse.csr_matrix(np.ones((c1.shape[1], 1)))
    cc1 = c1.T * On1
    cc2 = On2 * c2
    return cc1.minimum(cc2)

列表迭代：

这是一个纯Python迭代版本：

def foo0(c1, c2):
    data, row, col = [], [], []
    cc = c1.tocoo(); d1 = cc.data.tolist(); l1 = cc.col.tolist()
    cc = c2.tocoo(); d2 = cc.data.tolist(); l2 = cc.col.tolist()
    for i, i1 in enumerate(l1):
        for j, i2 in enumerate(l2):
            data.append(min(d1[i], d2[j]))
            row.append(i1)
            col.append(i2)
    minl = sparse.coo_matrix((data, (row, col)), shape=(c1.shape[1], c2.shape[1]))
    return minl

foo0(c1, c2)

# <100x100 sparse matrix of type '<class 'numpy.float64'>'
# 	with 400 stored elements in COOrdinate format>

我们不必在c2上进行迭代，而是使用数组方法：

def foo02(c1, c2):
    data, row, col = [], [], []
    cc = c1.tocoo(); d1 = cc.data.tolist(); l1 = cc.col.tolist()
    cc = c2.tocoo(); d2 = cc.data; l2 = cc.col
    for i, i1 in enumerate(l1):
        data.append(np.minimum(d1[i], d2))
        row.append([i1] * l2.shape[0])
        col.append(l2)
    data = np.hstack(data)
    row = np.hstack(row)
    col = np.hstack(col)
    minl = sparse.coo_matrix((data, (row, col)), shape=(c1.shape[1], c2.shape[1]))
    return minl

这在foo0上有适度的时间改进，可能更好地适应规模。

你建议的更改：

def foo11(c1, c2):
    On1 = c2.copy(); On1.data[:] = 1
    On2 = c1.T.copy(); On2.data[:] = 1
    cc1 = c1.T * On1
    cc2 = On2 * c2
    return cc1.minimum(cc2)

1.21 ms ± 14 µs per loop

英文:

Maybe I should insist on a [mcve], but let's make a guess as to what's a reasonable sample:

In [107]: from scipy import sparse    
In [108]: c1 = sparse.random(1,100,.2,&#39;csr&#39;)    
In [109]: c2 = sparse.random(1,100,.2,&#39;csr&#39;)

In [110]: def foo(c1,c2):
     ...:     minl = sparse.lil_matrix((c1.shape[1], c2.shape[1]))
     ...:     for i1 in c1.nonzero()[1]:
     ...:         for i2 in c2.nonzero()[1]:
     ...:             minl[i1, i2] = min(c1[0, i1], c2[0, i2])
     ...:     return minl
     ...: 

In [111]: LL1 = foo(c1,c2)

In [112]: LL1
Out[112]: 
&lt;100x100 sparse matrix of type &#39;&lt;class &#39;numpy.float64&#39;&gt;&#39;
	with 400 stored elements in List of Lists format&gt;

dense with broadcasting

For dense arrays a simple element-wise minimum for matching array lengths is:

In [118]: np.minimum(c1.A,c2.A).shape
Out[118]: (1, 100)

and with broadcasting we can do an 'outer' minimum:

In [119]: np.minimum(c1.T.A,c2.A).shape
Out[119]: (100, 100)

and that matches your iterative answer:

In [122]: np.allclose(np.minimum(c1.T.A,c2.A),LL1.A)
Out[122]: True

And comparative times:

In [123]: timeit LL1 = foo(c1,c2)
37.8 ms &#177; 240 &#181;s per loop (mean &#177; std. dev. of 7 runs, 10 loops each)

In [124]: timeit np.minimum(c1.T.A,c2.A)
257 &#181;s &#177; 3.21 &#181;s per loop (mean &#177; std. dev. of 7 runs, 1000 loops each)

But sparse doesn't do broadcasting.

I suggested building a coo matrix in the loop

Another possibility is to 'replicate' c1.T and c2 so we can do the minimum.

expanded sparse matrices

I can expand the matrics by multiplying by the appropriate sparse ones matrix:

In [135]: (c1.T*sparse.csr_matrix(np.ones((1,100)))).minimum(sparse.csr_matrix(np.ones((100,1))*c2))
Out[135]: 
&lt;100x100 sparse matrix of type &#39;&lt;class &#39;numpy.float64&#39;&gt;&#39;
	with 400 stored elements in Compressed Sparse Column format&gt;

In [136]: np.allclose(_.A,LL1.A)
Out[136]: True

That times at

1.91 ms &#177; 15.2 &#181;s per loop (mean &#177; std. dev. of 7 runs, 1000 loops each)

and packaged neatly

def foo1(c1,c2):
    On1 = sparse.csr_matrix(np.ones((1,c2.shape[1])))
    On2 = sparse.csr_matrix(np.ones((c1.shape[1],1)))
    cc1 = c1.T*On1
    cc2 = On2*c2
    return cc1.minimum(cc2)

List iteration

Here's a "pure python" iterative version

def foo0(c1,c2):
    data, row, col = [],[],[]
    cc = c1.tocoo(); d1 = cc.data.tolist(); l1 = cc.col.tolist()
    cc = c2.tocoo(); d2 = cc.data.tolist(); l2 = cc.col.tolist()
    for i,i1 in enumerate(l1):
        for j,i2 in enumerate(l2):
            data.append(min(d1[i],d2[j]))
            row.append(i1)
            col.append(i2)
    minl = sparse.coo_matrix((data, (row,col)), shape=(c1.shape[1],c2.shape[1]))
    return minl

In [170]: foo0(c1,c2)
Out[170]: 
&lt;100x100 sparse matrix of type &#39;&lt;class &#39;numpy.float64&#39;&gt;&#39;
	with 400 stored elements in COOrdinate format&gt;

In [171]: np.allclose(_.A,LL1.A)
Out[171]: True

In [172]: timeit foo0(c1,c2)
576 &#181;s &#177; 11 &#181;s per loop (mean &#177; std. dev. of 7 runs, 1000 loops each)

We don't have to iterate on the c2, instead using array methods

def foo02(c1,c2):
    data, row, col = [],[],[]
    cc = c1.tocoo(); d1 = cc.data.tolist(); l1 = cc.col.tolist()
    cc = c2.tocoo(); d2 = cc.data; l2 = cc.col
    for i,i1 in enumerate(l1):
        data.append(np.minimum(d1[i],d2))
        row.append([i1]*l2.shape[0])
        col.append(l2)
    data = np.hstack(data)
    row = np.hstack(row)
    col = np.hstack(col)
    minl = sparse.coo_matrix((data, (row,col)), shape=(c1.shape[1],c2.shape[1]))
    return minl

This has a modest time improvement on foo0. It may scale better.

Your suggested change

def foo11(c1,c2):
    On1 = c2.copy(); On1.data[:] = 1
    On2 = c1.T.copy(); On2.data[:] = 1
    cc1 = c1.T*On1
    cc2 = On2*c2
    return cc1.minimum(cc2)

1.21 ms ± 14 µs per loop

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

SciPy 的等价函数是 `scipy.minimum.outer`。

问题

答案1

dense with broadcasting

expanded sparse matrices

List iteration

Scipy中的插值在Python中使用meshgrid。

SciPy LinearNDInterpolator RegularGridInterpolator对相同数据产生不同的结果。

非线性方程组带约束条件

从C++中的旋转矩阵计算欧拉角：

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论