Which one of the two codes is more efficient to run on GPU?

huangapple go评论55阅读模式
英文:

Which one of the two codes is more efficient to run on GPU?

问题

我知道,在GPU上,如果线程走不同的路径,分支是一个不好的实践。所以,我在思考如何避免分支,所以我想到了一个特定的想法。例如,有一个需要在GPU上运行的任务:

// a 取值为 0 或 1
if(a) b=b+32;
barrier();

我可以重写这段代码并去掉分支:

// a 取值为 0 或 1
b=b+a*32;

我知道这个例子不太现实,但只是一个想法,哪种写代码的方式在GPU上更有效呢?(实际上,我曾经遇到过一些情况,可以避免分支并使用第二种方法)。

我没有真正尝试过任何方法,但一般的理解将帮助我以后更高效地编写代码。

英文:

I know, that branching is a bad practice on GPU, if threads take different paths. So, I was thinking how to avoid branching so I came to a certain idea. For example, there is a task needed to be run on GPU:

// a takes values either 0 or 1
if(a) b=b+32;
barrier();

I can rewrite this code and exclude branching:

// a takes values either 0 or 1
b=b+a*32;

I know that this example is not realistic, but just as idea, which one of the two ways of writing code would be more efficient on GPU? (In fact, I had practical situations, where I could avoid branching and use the second method).

I didn't really tried anything, but the general understanding will help me to write my code more efficiently later on.

答案1

得分: 1

这种优化通常由编译器完成。不必担心它。
关键通常是内存管理:合并访问,避免在全局内存中写入/读取太多数据,巧妙使用共享内存和本地存储!

英文:

This kind of optimization is (normaly) done by the compiler. Dont care about it.
The key is very often memory management : coalescent access, avoid to write/read too much in global memory, use smartly shared memory and local store!

huangapple
  • 本文由 发表于 2023年4月13日 22:48:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76006856.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定