英文:
Python sparse matrix C with elements c_ij = sum_j min(a_ij, b_ji) from sparse matrices A and B
问题
我有两个稀疏矩阵 A 和 B,其中 A 和 B 拥有相同的列数(但可能有不同的行数)。我尝试获取一个稀疏矩阵 C,其中元素 c_ij = sum_j min(a_ij, b_ji),其中 a_ij 和 b_ij 分别是 A 和 B 的元素。
这是否可以高效地完成,最好是稀疏地?即使密集也可以,但所涉及的矩阵将非常庞大(数万行)且非常稀疏。
细节:
这类似于标准(数学)矩阵乘法,但与元素的乘积求和不同,我想要求元素的最小值之和。
举个例子会有帮助。使用标准矩阵乘法 A * B 给出元素 c_ij = sum_j a_ij * b_ji,其中 a_ij 在 A 中,b_ij 在 B 中,i 和 j 是行和列索引,分别。
例如,
import numpy as np
from scipy.sparse import csr_matrix
A = csr_matrix([[1, 2, 0],
[0, 0, 3],
[4, 0, 5]])
B = csr_matrix([[2, 0, 0],
[0, 0, 3],
[6, 0, 0]])
C = A * B
print(f"\nC = A * B = \n{C}")
给出
>>>
C = A * B =
(0, 2) 6
(0, 0) 2
(1, 0) 18
(2, 0) 38
其中
c_31 = 38 = sum_j ( a_3j * b_j1 ) = 4 * 2 + 0 * 0 + 5 * 6 = 38。
对于 C 的每个元素,我希望最小值之和 min(a_ij * b_ji) 而不是乘积之和 a_ij * b_ji:
c_31 = sum_j min(a_3j, b_j1) = min(4, 2) + min(0, 0) + min(5, 6) = 7。
英文:
I have two sparse matrices A and B where A and B have the same number of columns (but potentially different numbers of rows). I'm trying to get a sparse matrix C with elements c_ij = sum_j min(a_ij,b_ji), where a_ij and b_ij are elements of A and B, respectively.
Can this be done efficiently, and ideally sparsely? Even densely would be okay, but the matrices involved will be huge (tens of thousands of rows) and very sparse.
Details:
The analogy is with standard (mathematical) matrix multiplication, but instead of summing the product of elements, I want to sum the minimum of elements.
An example will help. Using standard matrix multiplication A * B gives elements c_ij = sum_j a_ij * b_ji for elements a_ij in A and b_ij in B with i and j as row and column indices, respectively.
For example,
import numpy as np
from scipy.sparse import csr_matrix
A = csr_matrix([[1, 2, 0],
[0, 0, 3],
[4, 0, 5]])
B = csr_matrix([[2, 0, 0],
[0, 0, 3],
[6, 0, 0]])
C = A * B
print(f"\nC = A * B = \n{C}")
gives
>>>
C = A * B =
(0, 2) 6
(0, 0) 2
(1, 0) 18
(2, 0) 38
where
c_31 = 38 = sum_j ( a_3j * b_j1 ) = 4 * 2 + 0 * 0 + 5 * 6 = 38.
For each element of C, I want the sum of minimums min(a_ij * b_ji) instead of the sum of products a_ij * b_ji:
c_31 = sum_j min(a_3j,b_j1) = min(4,2) + min(0,0) + min(5,6) = 7.
答案1
得分: 1
在正常的稠密数组中,这可以更容易地使用三维数组来完成,但在稀疏形式中无法使用。相反,
import numpy as np
from scipy.sparse import csr_array, coo_array
# If you're able to start in COO format instead of CSR format, it will make the code below simpler
A = csr_array([
[1, 2, 0],
[0, 0, 3],
[4, 0, 5],
[4, 0, 5],
[1, 1, 6],
])
B = csr_array([
[2, 0, 0, 8],
[0, 0, 3, 2],
[6, 0, 0, 4],
])
m, n = A.shape
n, p = B.shape
# first min argument based on A
a_coo = A.tocoo(copy=False)
a_row_idx = ((a_coo.col + a_coo.row*(n*p))[None, :] + np.arange(0, n*p, n)[:, None]).ravel()
a_min = coo_array((
np.broadcast_to(A.data, (p, A.size)).ravel(), # data: row-major repeat
(
a_row_idx, # row indices
np.zeros_like(a_row_idx), # column indices
),
), shape=(m*n*p, 2))
# test; deleteme
a_min_dense = np.zeros((m*n*p, 2), dtype=int)
a_min_dense[:, 0] = np.repeat(A.toarray(), p, axis=0).ravel()
assert np.allclose(a_min_dense, a_min.toarray())
# second min argument based on B
b_coo = B.tocoo(copy=False)
b_row_idx = ((b_coo.row + b_coo.col*n)[None, :] + np.arange(0, m*n*p, n*p)[:, None]).ravel()
b_min = coo_array((
np.broadcast_to(B.data, (m, B.size)).ravel(), # data: row-major repeat
(
b_row_idx, # row indices
np.ones_like(b_row_idx), # column indices
),
), shape=(m*n*p, 2))
# test; deleteme
b_min_dense = np.zeros((m*n*p, 2), dtype=int)
b_min_dense[:, 1] = np.tile(B.T.toarray().ravel(), (1, m)).ravel()
assert np.allclose(b_min_dense, b_min.toarray())
# Alternative to this: use two one-dimensional columns and hstack
ab = (a_min + b_min).min(axis=1)
addend = ab.reshape((-1, n))
# scipy sparse summation input is sparse, but output is NOT SPARSE. It's an np.matrix.
# If that matters, you need to run a more complex operation on 'ab' that manipulates indices.
# deleteme
total = addend.sum(axis=1).reshape((m, p))
# Example true sparse sum
separators = np.insert(
np.nonzero(np.diff(addend.row))[0] + 1,
0, 0,
)
sparse_total = coo_array((
np.add.reduceat(addend.data, separators),
(
addend.row[separators],
np.zeros_like(separators),
)
), shape=(m*p, 1)).reshape((m, p))
# test; deleteme
ab_dense = (a_min_dense + b_min_dense).min(axis=1)
sum_dense = ab_dense.reshape((-1, n)).sum(axis=1).reshape((m, p))
assert np.allclose(ab.toarray().ravel(), ab_dense)
assert np.allclose(sum_dense, total)
assert np.allclose(sum_dense, sparse_total.toarray())
print(sum_dense)
其他注意事项:不要使用 _matrix
;它已经被废弃,建议使用 _array
。
上面的代码是针对与您的示例矩阵不同的示例矩阵操作的,有意如此,因为当所有维度的大小都相同时,很难可靠地开发数组操作代码。然而,当我将它们修改为与您的原始数组相同的大小时,得到的结果是
[[1 0 2]
[3 0 0]
[7 0 0]]
英文:
In normal dense arrays this would be done more easily with a three-dimensional array, but those are unavailable in sparse form. Instead,
import numpy as np
from scipy.sparse import csr_array, coo_array
# If you're able to start in COO format instead of CSR format, it will make the code below simpler
A = csr_array([
[1, 2, 0],
[0, 0, 3],
[4, 0, 5],
[4, 0, 5],
[1, 1, 6],
])
B = csr_array([
[2, 0, 0, 8],
[0, 0, 3, 2],
[6, 0, 0, 4],
])
m, n = A.shape
n, p = B.shape
# first min argument based on A
a_coo = A.tocoo(copy=False)
a_row_idx = ((a_coo.col + a_coo.row*(n*p))[None, :] + np.arange(0, n*p, n)[:, None]).ravel()
a_min = coo_array((
np.broadcast_to(A.data, (p, A.size)).ravel(), # data: row-major repeat
(
a_row_idx, # row indices
np.zeros_like(a_row_idx), # column indices
),
), shape=(m*n*p, 2))
# test; deleteme
a_min_dense = np.zeros((m*n*p, 2), dtype=int)
a_min_dense[:, 0] = np.repeat(A.toarray(), p, axis=0).ravel()
assert np.allclose(a_min_dense, a_min.toarray())
# second min argument based on B
b_coo = B.tocoo(copy=False)
b_row_idx = ((b_coo.row + b_coo.col*n)[None, :] + np.arange(0, m*n*p, n*p)[:, None]).ravel()
b_min = coo_array((
np.broadcast_to(B.data, (m, B.size)).ravel(), # data: row-major repeat
(
b_row_idx, # row indices
np.ones_like(b_row_idx), # column indices
),
), shape=(m*n*p, 2))
# test; deleteme
b_min_dense = np.zeros((m*n*p, 2), dtype=int)
b_min_dense[:, 1] = np.tile(B.T.toarray().ravel(), (1, m)).ravel()
assert np.allclose(b_min_dense, b_min.toarray())
# Alternative to this: use two one-dimensional columns and hstack
ab = (a_min + b_min).min(axis=1)
addend = ab.reshape((-1, n))
# scipy sparse summation input is sparse, but output is NOT SPARSE. It's an np.matrix.
# If that matters, you need to run a more complex operation on 'ab' that manipulates indices.
# deleteme
total = addend.sum(axis=1).reshape((m, p))
# Example true sparse sum
separators = np.insert(
np.nonzero(np.diff(addend.row))[0] + 1,
0, 0,
)
sparse_total = coo_array((
np.add.reduceat(addend.data, separators),
(
addend.row[separators],
np.zeros_like(separators),
)
), shape=(m*p, 1)).reshape((m, p))
# test; deleteme
ab_dense = (a_min_dense + b_min_dense).min(axis=1)
sum_dense = ab_dense.reshape((-1, n)).sum(axis=1).reshape((m, p))
assert np.allclose(ab.toarray().ravel(), ab_dense)
assert np.allclose(sum_dense, total)
assert np.allclose(sum_dense, sparse_total.toarray())
print(sum_dense)
Other notes: don't use _matrix
; it's deprecated in favour of _array
.
The above operates on different example matrices from yours, deliberately, because it's impossible to reliably develop array manipulation code when all of the dimensions are of the same size. However, when I modify them to be the same as your original arrays, I get
[[1 0 2]
[3 0 0]
[7 0 0]]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论