英文:
Which one of the two codes is more efficient to run on GPU?
问题
我知道,在GPU上,如果线程走不同的路径,分支是一个不好的实践。所以,我在思考如何避免分支,所以我想到了一个特定的想法。例如,有一个需要在GPU上运行的任务:
// a 取值为 0 或 1
if(a) b=b+32;
barrier();
我可以重写这段代码并去掉分支:
// a 取值为 0 或 1
b=b+a*32;
我知道这个例子不太现实,但只是一个想法,哪种写代码的方式在GPU上更有效呢?(实际上,我曾经遇到过一些情况,可以避免分支并使用第二种方法)。
我没有真正尝试过任何方法,但一般的理解将帮助我以后更高效地编写代码。
英文:
I know, that branching is a bad practice on GPU, if threads take different paths. So, I was thinking how to avoid branching so I came to a certain idea. For example, there is a task needed to be run on GPU:
// a takes values either 0 or 1
if(a) b=b+32;
barrier();
I can rewrite this code and exclude branching:
// a takes values either 0 or 1
b=b+a*32;
I know that this example is not realistic, but just as idea, which one of the two ways of writing code would be more efficient on GPU? (In fact, I had practical situations, where I could avoid branching and use the second method).
I didn't really tried anything, but the general understanding will help me to write my code more efficiently later on.
答案1
得分: 1
这种优化通常由编译器完成。不必担心它。
关键通常是内存管理:合并访问,避免在全局内存中写入/读取太多数据,巧妙使用共享内存和本地存储!
英文:
This kind of optimization is (normaly) done by the compiler. Dont care about it.
The key is very often memory management : coalescent access, avoid to write/read too much in global memory, use smartly shared memory and local store!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论