PyTorch或NumPy中未找到高效的广播操作,以避免性能瓶颈。

huangapple go评论72阅读模式
英文:

Efficient PyTorch or NumPy broadcasting not found to avoid bottleneck operations

问题

以下是翻译好的内容:

我在基于PyTorch的代码中有以下实现其中涉及嵌套的for循环嵌套的for循环以及`if`条件使代码执行非常缓慢我尝试避免嵌套循环涉及到NumPy和PyTorch中的广播概念但没有产生任何结果任何关于避免使用`for`循环的帮助将不胜感激

这里是我阅读过的链接 [PyTorch][1], [NumPy][2].

#!/usr/bin/env python
# coding: utf-8

import torch

batch_size=32
mask=torch.FloatTensor(batch_size).uniform_() > 0.8

teacher_count=510
student_count=420
feature_dim=750
student_output=torch.zeros([batch_size,student_count])
teacher_output=torch zeros([batch_size,teacher_count])

student_adjacency_mat=torch.randint(0,1,(student_count,student_count))
teacher_adjacency_mat=torch.randint(0,1,(teacher_count,teacher_count))

student_feat=torch.rand([batch_size,feature_dim])
student_graph=torch.rand([student_count,feature_dim])
teacher_feat=torch.rand([batch_size,feature_dim])
teacher_graph=torch.rand([teacher_count,feature_dim])

for m in range(batch_size):
    if mask[m]==1:
        for i in range(student_count):
            for j in range(student_count):
                student_output[m][i]=student_output[m][i]+student_adjacency_mat[i][j]*torch.dot(student_feat[m],student_graph[j])
    if mask[m]==0:
        for i in range(teacher_count):
            for j in range(teacher_count):
                teacher_output[m][i]=teacher_output[m][i]+teacher_adjacency_mat[i][j]*torch.dot(teacher_feat[m],teacher_graph[j])

[1]: https://pytorch.org/docs/stable/notes/broadcasting.html
[2]: https://numpy.org/doc/stable/user/basics.broadcasting.html

希望这有所帮助。如果您需要进一步的信息,请随时提问。

英文:

I have the following implementation in my PyTorch-based code which involves a nested for loop. The nested for loop along with the if condition makes the code very slow to execute. I attempted to avoid the nested loop to involve the broadcasting concepts in NumPy and PyTorch but that did not yield any result. Any help regarding avoiding the for loops will be appreciated.

Here are the links I have read PyTorch, NumPy.

#!/usr/bin/env python
# coding: utf-8

import torch

batch_size=32
mask=torch.FloatTensor(batch_size).uniform_() > 0.8

teacher_count=510
student_count=420
feature_dim=750
student_output=torch.zeros([batch_size,student_count])
teacher_output=torch.zeros([batch_size,teacher_count])

student_adjacency_mat=torch.randint(0,1,(student_count,student_count))
teacher_adjacency_mat=torch.randint(0,1,(teacher_count,teacher_count))

student_feat=torch.rand([batch_size,feature_dim])
student_graph=torch.rand([student_count,feature_dim])
teacher_feat=torch.rand([batch_size,feature_dim])
teacher_graph=torch.rand([teacher_count,feature_dim])


for m in range(batch_size):
    if mask[m]==1:
        for i in range(student_count):
            for j in range(student_count):
                student_output[m][i]=student_output[m][i]+student_adjacency_mat[i][j]*torch.dot(student_feat[m],student_graph[j])
    if mask[m]==0:
        for i in range(teacher_count):
            for j in range(teacher_count):
                teacher_output[m][i]=teacher_output[m][i]+teacher_adjacency_mat[i][j]*torch.dot(teacher_feat[m],teacher_graph[j])   

答案1

得分: 1

以下是翻译的内容:

使用 torch.einsum

这是使用 torch.einsum 的典型用例。您可以在此线程中了解更多信息,我也在这里写了一个答案。

如果我们远离所有实现细节,使用 torch.einsum 的公式相当自明:

o = torch.einsum('ij,mf,jf->mi', adj_matrix, feats, graph)

在伪代码中,这可以归结为:

o[m,i] += adj_matrix[i,j] * feats[m,f] * graph[j,f]

对于您感兴趣的领域中的所有 ijmf。这正是所需的操作。

与用 M = mask[:, None] 展开成适当形式的掩码组合,对于学生张量,这为您提供了:

student = M * torch.einsum('ij,mf,jf->mi', student_adjacency_mat, student_feat, student_graph)

对于教师的结果,您可以使用 ~M 来反转掩码:

teacher = ~M * torch.einsum('ij,mf,jf->mi', teacher_adjacency_mat, teacher_feat, teacher_graph)

使用 torch.matmul

另外,由于这是 torch.einsum 的一个相当简单的应用,您也可以使用两次调用 torch.matmul 来实现。给定 AB,分别是由 ikkj 索引的两个矩阵,您可以得到 A@B,对应于 ik@kj -> ij。因此,您可以使用以下方式获得所需的结果:

g = student_feat @ student_graph.T  # mf@jf.T -> mf@fj -> mj
result = g @ student_adjacency_mat.T  # mj@ij.T -> mj@ji -> mi

请注意,这两个步骤与 torch.einsum 调用中的 'ij,mf,jf->mi' 相关。首先是 mf,jf->mj,然后是 mj,ij->mi

附注

您当前的虚拟学生和教师邻接矩阵都是用零初始化的。也许您想要的是:

student_adjacency_mat = torch.randint(0, 2, (student_count, student_count)).float()
teacher_adjacency_mat = torch.randint(0, 2, (teacher_count, teacher_count)).float()
英文:

Problem statement

The operation you are looking to perform is quite straightforward. If you look closely at your loop:

for m in range(batch_size):
  if mask[m]==1:
    for i in range(student_count):
      for j in range(student_count):
         student_output[m][i] += student_adjacency_mat[i][j]*torch.dot(student_feat[m],student_graph[j])

   if mask[m]==0:
     for i in range(teacher_count):
       for j in range(teacher_count):
         teacher_output[m][i] += teacher_adjacency_mat[i][j]*torch.dot(teacher_feat[m],teacher_graph[j])

Elements relevant to us:

  • You have two operations separated based on a mask which can ultimately be computed separately.

  • Each operation is looping through the adjacent matrices, ie. student_count².

  • The assignment operation comes down to

    output[m,i] += adj_matrix[i,j] * <feats[m] / graph[j]>
    

    where adj_matrix[i,j] is a scalar.


Using torch.einsum

This is a typical use case for torch.einsum. You can read more on this thread where I also happen to have written an answer.

If we keep away from all implementation details, the formulation with torch.einsum is rather self-explanatory:

o = torch.einsum('ij,mf,jf->mi', adj_matrix, feats, graph)

In pseudo-code, this comes down to:

o[m,i] += adj_matrix[i,j]*feats[m,f]*graph[j,f]

For all i, j, m, and f in your domain of interest. Which is precisely the desired operation.

Combined with the mask expanded to the appropriate form with M = mask[:,None], this gives you for the student tensor:

>>> student = M*torch.einsum('ij,mf,jf->mi', student_adjacency_mat, student_feat, student_graph)

For the teacher result, you can invert the mask with ~M:

>>> teacher = ~M*torch.einsum('ij,mf,jf->mi', teacher_adjacency_mat, teacher_feat, teacher_graph)

Using torch.matmul

Alternatively, since this is a rather simple application of torch.einsum, you can also get away with two calls to torch.matmul. Given A and B, two matrices indexed by ik, and kj respectively, you get A@B which corresponds to ik@kj -> ij. Therefore you can get the desired result with:

>>> g = student_feat@student_graph.T # mf@jf.T -> mf@fj -> mj
>>> g@student_adjacency_mat.T        # mj@ij.T -> mj@ji -> mi

See how the two steps relate to the torch.einsum call with 'ij,mf,jf->mi'. First mf,jf->mj, followed by mj,ij->mi.


Side note your current dummy student and teacher adjacenty matrices are initialized with zeros. Maybe you meant to have:

student_adjacency_mat=torch.randint(0,2,(student_count,student_count)).float()
teacher_adjacency_mat=torch.randint(0,2,(teacher_count,teacher_count)).float()

huangapple
  • 本文由 发表于 2023年2月24日 12:55:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/75552760.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定