2023年4月13日 22:48:51go评论78阅读模式

英文:

Which one of the two codes is more efficient to run on GPU?

问题

我知道，在GPU上，如果线程走不同的路径，分支是一个不好的实践。所以，我在思考如何避免分支，所以我想到了一个特定的想法。例如，有一个需要在GPU上运行的任务：

// a 取值为 0 或 1
if(a) b=b+32;
barrier();

我可以重写这段代码并去掉分支：

// a 取值为 0 或 1
b=b+a*32;

我知道这个例子不太现实，但只是一个想法，哪种写代码的方式在GPU上更有效呢？（实际上，我曾经遇到过一些情况，可以避免分支并使用第二种方法）。

我没有真正尝试过任何方法，但一般的理解将帮助我以后更高效地编写代码。

英文:

I know, that branching is a bad practice on GPU, if threads take different paths. So, I was thinking how to avoid branching so I came to a certain idea. For example, there is a task needed to be run on GPU:

// a takes values either 0 or 1
if(a) b=b+32;
barrier();

I can rewrite this code and exclude branching:

// a takes values either 0 or 1
b=b+a*32;

I know that this example is not realistic, but just as idea, which one of the two ways of writing code would be more efficient on GPU? (In fact, I had practical situations, where I could avoid branching and use the second method).

I didn't really tried anything, but the general understanding will help me to write my code more efficiently later on.

答案1

得分: 1

这种优化通常由编译器完成。不必担心它。
关键通常是内存管理：合并访问，避免在全局内存中写入/读取太多数据，巧妙使用共享内存和本地存储！

英文:

This kind of optimization is (normaly) done by the compiler. Dont care about it.
The key is very often memory management : coalescent access, avoid to write/read too much in global memory, use smartly shared memory and local store!

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Which one of the two codes is more efficient to run on GPU?

问题

答案1

如何在Python程序执行过程中减少内存使用量

“在调用torch中的backward()函数时，对aten::linear_backward的导数未实现。”

PyTorch: torch.cuda.OutOfMemoryError: 在设备 0 上第 0 个副本中捕获到 OutOfMemoryError

如何在主机上构建和访问`libcu++`的``。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。