英文:
Efficient PyTorch or NumPy broadcasting not found to avoid bottleneck operations
问题
以下是翻译好的内容:
我在基于PyTorch的代码中有以下实现,其中涉及嵌套的for循环。嵌套的for循环以及`if`条件使代码执行非常缓慢。我尝试避免嵌套循环,涉及到NumPy和PyTorch中的广播概念,但没有产生任何结果。任何关于避免使用`for`循环的帮助将不胜感激。
这里是我阅读过的链接 [PyTorch][1], [NumPy][2].
#!/usr/bin/env python
# coding: utf-8
import torch
batch_size=32
mask=torch.FloatTensor(batch_size).uniform_() > 0.8
teacher_count=510
student_count=420
feature_dim=750
student_output=torch.zeros([batch_size,student_count])
teacher_output=torch zeros([batch_size,teacher_count])
student_adjacency_mat=torch.randint(0,1,(student_count,student_count))
teacher_adjacency_mat=torch.randint(0,1,(teacher_count,teacher_count))
student_feat=torch.rand([batch_size,feature_dim])
student_graph=torch.rand([student_count,feature_dim])
teacher_feat=torch.rand([batch_size,feature_dim])
teacher_graph=torch.rand([teacher_count,feature_dim])
for m in range(batch_size):
if mask[m]==1:
for i in range(student_count):
for j in range(student_count):
student_output[m][i]=student_output[m][i]+student_adjacency_mat[i][j]*torch.dot(student_feat[m],student_graph[j])
if mask[m]==0:
for i in range(teacher_count):
for j in range(teacher_count):
teacher_output[m][i]=teacher_output[m][i]+teacher_adjacency_mat[i][j]*torch.dot(teacher_feat[m],teacher_graph[j])
[1]: https://pytorch.org/docs/stable/notes/broadcasting.html
[2]: https://numpy.org/doc/stable/user/basics.broadcasting.html
希望这有所帮助。如果您需要进一步的信息,请随时提问。
英文:
I have the following implementation in my PyTorch-based code which involves a nested for loop. The nested for loop along with the if
condition makes the code very slow to execute. I attempted to avoid the nested loop to involve the broadcasting concepts in NumPy and PyTorch but that did not yield any result. Any help regarding avoiding the for
loops will be appreciated.
Here are the links I have read PyTorch, NumPy.
#!/usr/bin/env python
# coding: utf-8
import torch
batch_size=32
mask=torch.FloatTensor(batch_size).uniform_() > 0.8
teacher_count=510
student_count=420
feature_dim=750
student_output=torch.zeros([batch_size,student_count])
teacher_output=torch.zeros([batch_size,teacher_count])
student_adjacency_mat=torch.randint(0,1,(student_count,student_count))
teacher_adjacency_mat=torch.randint(0,1,(teacher_count,teacher_count))
student_feat=torch.rand([batch_size,feature_dim])
student_graph=torch.rand([student_count,feature_dim])
teacher_feat=torch.rand([batch_size,feature_dim])
teacher_graph=torch.rand([teacher_count,feature_dim])
for m in range(batch_size):
if mask[m]==1:
for i in range(student_count):
for j in range(student_count):
student_output[m][i]=student_output[m][i]+student_adjacency_mat[i][j]*torch.dot(student_feat[m],student_graph[j])
if mask[m]==0:
for i in range(teacher_count):
for j in range(teacher_count):
teacher_output[m][i]=teacher_output[m][i]+teacher_adjacency_mat[i][j]*torch.dot(teacher_feat[m],teacher_graph[j])
答案1
得分: 1
以下是翻译的内容:
使用 torch.einsum
这是使用 torch.einsum
的典型用例。您可以在此线程中了解更多信息,我也在这里写了一个答案。
如果我们远离所有实现细节,使用 torch.einsum
的公式相当自明:
o = torch.einsum('ij,mf,jf->mi', adj_matrix, feats, graph)
在伪代码中,这可以归结为:
o[m,i] += adj_matrix[i,j] * feats[m,f] * graph[j,f]
对于您感兴趣的领域中的所有 i
、j
、m
和 f
。这正是所需的操作。
与用 M = mask[:, None]
展开成适当形式的掩码组合,对于学生张量,这为您提供了:
student = M * torch.einsum('ij,mf,jf->mi', student_adjacency_mat, student_feat, student_graph)
对于教师的结果,您可以使用 ~M
来反转掩码:
teacher = ~M * torch.einsum('ij,mf,jf->mi', teacher_adjacency_mat, teacher_feat, teacher_graph)
使用 torch.matmul
另外,由于这是 torch.einsum
的一个相当简单的应用,您也可以使用两次调用 torch.matmul
来实现。给定 A
和 B
,分别是由 ik
和 kj
索引的两个矩阵,您可以得到 A@B
,对应于 ik@kj -> ij
。因此,您可以使用以下方式获得所需的结果:
g = student_feat @ student_graph.T # mf@jf.T -> mf@fj -> mj
result = g @ student_adjacency_mat.T # mj@ij.T -> mj@ji -> mi
请注意,这两个步骤与 torch.einsum
调用中的 'ij,mf,jf->mi' 相关。首先是 mf,jf->mj
,然后是 mj,ij->mi
。
附注
您当前的虚拟学生和教师邻接矩阵都是用零初始化的。也许您想要的是:
student_adjacency_mat = torch.randint(0, 2, (student_count, student_count)).float()
teacher_adjacency_mat = torch.randint(0, 2, (teacher_count, teacher_count)).float()
英文:
Problem statement
The operation you are looking to perform is quite straightforward. If you look closely at your loop:
for m in range(batch_size):
if mask[m]==1:
for i in range(student_count):
for j in range(student_count):
student_output[m][i] += student_adjacency_mat[i][j]*torch.dot(student_feat[m],student_graph[j])
if mask[m]==0:
for i in range(teacher_count):
for j in range(teacher_count):
teacher_output[m][i] += teacher_adjacency_mat[i][j]*torch.dot(teacher_feat[m],teacher_graph[j])
Elements relevant to us:
-
You have two operations separated based on a mask which can ultimately be computed separately.
-
Each operation is looping through the adjacent matrices, ie.
student_count²
. -
The assignment operation comes down to
output[m,i] += adj_matrix[i,j] * <feats[m] / graph[j]>
where
adj_matrix[i,j]
is a scalar.
Using torch.einsum
This is a typical use case for torch.einsum
. You can read more on this thread where I also happen to have written an answer.
If we keep away from all implementation details, the formulation with torch.einsum
is rather self-explanatory:
o = torch.einsum('ij,mf,jf->mi', adj_matrix, feats, graph)
In pseudo-code, this comes down to:
o[m,i] += adj_matrix[i,j]*feats[m,f]*graph[j,f]
For all i
, j
, m
, and f
in your domain of interest. Which is precisely the desired operation.
Combined with the mask expanded to the appropriate form with M = mask[:,None]
, this gives you for the student tensor:
>>> student = M*torch.einsum('ij,mf,jf->mi', student_adjacency_mat, student_feat, student_graph)
For the teacher result, you can invert the mask with ~M
:
>>> teacher = ~M*torch.einsum('ij,mf,jf->mi', teacher_adjacency_mat, teacher_feat, teacher_graph)
Using torch.matmul
Alternatively, since this is a rather simple application of torch.einsum
, you can also get away with two calls to torch.matmul
. Given A
and B
, two matrices indexed by ik
, and kj
respectively, you get A@B
which corresponds to ik@kj -> ij
. Therefore you can get the desired result with:
>>> g = student_feat@student_graph.T # mf@jf.T -> mf@fj -> mj
>>> g@student_adjacency_mat.T # mj@ij.T -> mj@ji -> mi
See how the two steps relate to the torch.einsum call with 'ij,mf,jf->mi'. First mf,jf->mj
, followed by mj,ij->mi
.
Side note your current dummy student and teacher adjacenty matrices are initialized with zeros. Maybe you meant to have:
student_adjacency_mat=torch.randint(0,2,(student_count,student_count)).float()
teacher_adjacency_mat=torch.randint(0,2,(teacher_count,teacher_count)).float()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论