2023年2月24日 12:55:51go评论103阅读模式

英文:

Efficient PyTorch or NumPy broadcasting not found to avoid bottleneck operations

问题

以下是翻译好的内容：

我在基于PyTorch的代码中有以下实现，其中涉及嵌套的for循环。嵌套的for循环以及`if`条件使代码执行非常缓慢。我尝试避免嵌套循环，涉及到NumPy和PyTorch中的广播概念，但没有产生任何结果。任何关于避免使用`for`循环的帮助将不胜感激。
这里是我阅读过的链接 [PyTorch][1], [NumPy][2].
#!/usr/bin/env python
# coding: utf-8
import torch
batch_size=32
mask=torch.FloatTensor(batch_size).uniform_() > 0.8
teacher_count=510
student_count=420
feature_dim=750
student_output=torch.zeros([batch_size,student_count])
teacher_output=torch zeros([batch_size,teacher_count])
student_adjacency_mat=torch.randint(0,1,(student_count,student_count))
teacher_adjacency_mat=torch.randint(0,1,(teacher_count,teacher_count))
student_feat=torch.rand([batch_size,feature_dim])
student_graph=torch.rand([student_count,feature_dim])
teacher_feat=torch.rand([batch_size,feature_dim])
teacher_graph=torch.rand([teacher_count,feature_dim])
for m in range(batch_size):
    if mask[m]==1:
        for i in range(student_count):
            for j in range(student_count):
                student_output[m][i]=student_output[m][i]+student_adjacency_mat[i][j]*torch.dot(student_feat[m],student_graph[j])
    if mask[m]==0:
        for i in range(teacher_count):
            for j in range(teacher_count):
                teacher_output[m][i]=teacher_output[m][i]+teacher_adjacency_mat[i][j]*torch.dot(teacher_feat[m],teacher_graph[j])
[1]: https://pytorch.org/docs/stable/notes/broadcasting.html
[2]: https://numpy.org/doc/stable/user/basics.broadcasting.html

希望这有所帮助。如果您需要进一步的信息，请随时提问。

英文:

I have the following implementation in my PyTorch-based code which involves a nested for loop. The nested for loop along with the if condition makes the code very slow to execute. I attempted to avoid the nested loop to involve the broadcasting concepts in NumPy and PyTorch but that did not yield any result. Any help regarding avoiding the for loops will be appreciated.

Here are the links I have read PyTorch, NumPy.

#!/usr/bin/env python
# coding: utf-8
import torch
batch_size=32
mask=torch.FloatTensor(batch_size).uniform_() &gt; 0.8
teacher_count=510
student_count=420
feature_dim=750
student_output=torch.zeros([batch_size,student_count])
teacher_output=torch.zeros([batch_size,teacher_count])
student_adjacency_mat=torch.randint(0,1,(student_count,student_count))
teacher_adjacency_mat=torch.randint(0,1,(teacher_count,teacher_count))
student_feat=torch.rand([batch_size,feature_dim])
student_graph=torch.rand([student_count,feature_dim])
teacher_feat=torch.rand([batch_size,feature_dim])
teacher_graph=torch.rand([teacher_count,feature_dim])
for m in range(batch_size):
    if mask[m]==1:
        for i in range(student_count):
            for j in range(student_count):
                student_output[m][i]=student_output[m][i]+student_adjacency_mat[i][j]*torch.dot(student_feat[m],student_graph[j])
    if mask[m]==0:
        for i in range(teacher_count):
            for j in range(teacher_count):
                teacher_output[m][i]=teacher_output[m][i]+teacher_adjacency_mat[i][j]*torch.dot(teacher_feat[m],teacher_graph[j])

答案1

得分: 1

以下是翻译的内容：

使用 `torch.einsum`

这是使用 torch.einsum 的典型用例。您可以在此线程中了解更多信息，我也在这里写了一个答案。

如果我们远离所有实现细节，使用 torch.einsum 的公式相当自明：

o = torch.einsum('ij,mf,jf->mi', adj_matrix, feats, graph)

在伪代码中，这可以归结为：

o[m,i] += adj_matrix[i,j] * feats[m,f] * graph[j,f]

对于您感兴趣的领域中的所有 i、j、m 和 f。这正是所需的操作。

与用 M = mask[:, None] 展开成适当形式的掩码组合，对于学生张量，这为您提供了：

student = M * torch.einsum('ij,mf,jf->mi', student_adjacency_mat, student_feat, student_graph)

对于教师的结果，您可以使用 ~M 来反转掩码：

teacher = ~M * torch.einsum('ij,mf,jf->mi', teacher_adjacency_mat, teacher_feat, teacher_graph)

使用 `torch.matmul`

另外，由于这是 torch.einsum 的一个相当简单的应用，您也可以使用两次调用 torch.matmul 来实现。给定 A 和 B，分别是由 ik 和 kj 索引的两个矩阵，您可以得到 A@B，对应于 ik@kj -> ij。因此，您可以使用以下方式获得所需的结果：

g = student_feat @ student_graph.T  # mf@jf.T -> mf@fj -> mj
result = g @ student_adjacency_mat.T  # mj@ij.T -> mj@ji -> mi

请注意，这两个步骤与 torch.einsum 调用中的 'ij,mf,jf->mi' 相关。首先是 mf,jf->mj，然后是 mj,ij->mi。

附注

您当前的虚拟学生和教师邻接矩阵都是用零初始化的。也许您想要的是：

student_adjacency_mat = torch.randint(0, 2, (student_count, student_count)).float()
teacher_adjacency_mat = torch.randint(0, 2, (teacher_count, teacher_count)).float()

英文:

Problem statement

The operation you are looking to perform is quite straightforward. If you look closely at your loop:

for m in range(batch_size):
  if mask[m]==1:
    for i in range(student_count):
      for j in range(student_count):
         student_output[m][i] += student_adjacency_mat[i][j]*torch.dot(student_feat[m],student_graph[j])
   if mask[m]==0:
     for i in range(teacher_count):
       for j in range(teacher_count):
         teacher_output[m][i] += teacher_adjacency_mat[i][j]*torch.dot(teacher_feat[m],teacher_graph[j])

Elements relevant to us:

You have two operations separated based on a mask which can ultimately be computed separately.
Each operation is looping through the adjacent matrices, ie. student_count².
The assignment operation comes down to
```
output[m,i] += adj_matrix[i,j] * &lt;feats[m] / graph[j]&gt;
```
where adj_matrix[i,j] is a scalar.

Using `torch.einsum`

This is a typical use case for torch.einsum. You can read more on this thread where I also happen to have written an answer.

If we keep away from all implementation details, the formulation with torch.einsum is rather self-explanatory:

o = torch.einsum(&#39;ij,mf,jf-&gt;mi&#39;, adj_matrix, feats, graph)

In pseudo-code, this comes down to:

o[m,i] += adj_matrix[i,j]*feats[m,f]*graph[j,f]

For all i, j, m, and f in your domain of interest. Which is precisely the desired operation.

Combined with the mask expanded to the appropriate form with M = mask[:,None], this gives you for the student tensor:

&gt;&gt;&gt; student = M*torch.einsum(&#39;ij,mf,jf-&gt;mi&#39;, student_adjacency_mat, student_feat, student_graph)

For the teacher result, you can invert the mask with ~M:

&gt;&gt;&gt; teacher = ~M*torch.einsum(&#39;ij,mf,jf-&gt;mi&#39;, teacher_adjacency_mat, teacher_feat, teacher_graph)

Using `torch.matmul`

Alternatively, since this is a rather simple application of torch.einsum, you can also get away with two calls to torch.matmul. Given A and B, two matrices indexed by ik, and kj respectively, you get A@B which corresponds to ik@kj -> ij. Therefore you can get the desired result with:

&gt;&gt;&gt; g = student_feat@student_graph.T # mf@jf.T -&gt; mf@fj -&gt; mj
&gt;&gt;&gt; g@student_adjacency_mat.T        # mj@ij.T -&gt; mj@ji -&gt; mi

See how the two steps relate to the torch.einsum call with 'ij,mf,jf->mi'. First mf,jf->mj, followed by mj,ij->mi.

Side note your current dummy student and teacher adjacenty matrices are initialized with zeros. Maybe you meant to have:

student_adjacency_mat=torch.randint(0,2,(student_count,student_count)).float()
teacher_adjacency_mat=torch.randint(0,2,(teacher_count,teacher_count)).float()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

PyTorch或NumPy中未找到高效的广播操作，以避免性能瓶颈。

问题

答案1

使用 `torch.einsum`

使用 `torch.matmul`

附注

Problem statement

Using `torch.einsum`

Using `torch.matmul`

如何将递减计数器回填到稀疏的Pandas列？

Tensorboard直方图的onehot操作导致ResourceExhauseError：OOM

无法安装Pattern包（操作系统Bash终端）

你为什么能够透过PyOpenGL中的物体看到它们？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论

问题

答案1

使用 torch.einsum

使用 torch.matmul

附注

Problem statement

Using torch.einsum

Using torch.matmul

发表评论

使用 `torch.einsum`

使用 `torch.matmul`

Using `torch.einsum`

Using `torch.matmul`