英文:
Should thread_block type be passed by reference?
问题
当将 thread_group
类型的对象传递给设备函数时,是否更倾向于按引用传递还是按值传递?
- 其中一个是否“正确”
- 每种方法的区别是什么
- 在什么情况下应该首选每种方法
Examples
在 programming-guide 和 developer blog 中的类似示例似乎处理方式不同。
Programming Guide
__device__
int sum(const thread_block& g, int *x, int n) {
// ...
g.sync()
return total;
}
Developer Blog
__device__
int sum(thread_block block, int *x, int n) {
...
block.sync();
...
return total;
}
Additional Info
programming-guide 还提到了有关构建隐式组的信息:
> 虽然您可以在代码的任何位置创建隐式组,但这样做是危险的。为隐式组创建句柄是一种集体操作,所有组中的线程都必须参与其中。如果组是在并非所有线程都能到达的条件分支中创建的,这可能导致死锁或数据损坏。因此,建议您在内核中尽早(在发生任何分支之前)为隐式组创建句柄,并在整个内核中使用该句柄。出于相同的原因,组句柄必须在声明时进行初始化(没有默认构造函数),不鼓励复制构造它们。
这会让我相信通过引用传递它们是首选的,但我承认协作组的各种细节可能有更多细微差别。是否将其按值传递被视为“复制构造”,因此不鼓励使用?
我没有注意到使用任何一种方法都会有性能或结果差异,但我可能只是没有测试正确的边缘情况;或者“未定义行为”可能只是以不会引发问题的方式运行。
英文:
Question
When passing thread_group
type objects to a device function, is there a preference for passing by reference vs passing by value?
- Is one of them "correct"
- What are the differences for each approach
- When should each approach be preferred
Examples
Similar examples in the programming-guide and the developer blog seem to handle this differently.
Programming Guide
__device__
int sum(const thread_block& g, int *x, int n) {
// ...
g.sync()
return total;
}
Developer Blog
__device__
int sum(thread_block block, int *x, int n) {
...
block.sync();
...
return total;
}
Additional Info
The programming-guide also has this to say about constructing implicit groups:
> Although you can create an implicit group anywhere in the code, it is dangerous to do so. Creating a handle for an implicit group is a collective operation—all threads in the group must participate. If the group was created in a conditional branch that not all threads reach, this can lead to deadlocks or data corruption. For this reason, it is recommended that you create a handle for the implicit group upfront (as early as possible, before any branching has occurred) and use that handle throughout the kernel. Group handles must be initialized at declaration time (there is no default constructor) for the same reason and copy-constructing them is discouraged.
Which would lead me to believe passing them by reference is preferred, but I will admit there is more than enough detail underlying the various cooperative groups that it's likely I've missed some nuance. Would passing by value be considered "copy-constructing" and therefor be discouraged?
I have not noticed any performance or result difference using either one, but I may have just not tested the correct edge case; or the "undefined-behavior" may just be working out in a way that doesn't cause a problem.
答案1
得分: 2
-
首先有几点观察:
-
你所参考的博客文章是2017年的,当时是预览版,而文档是最新的。基于这一点,你应该更倾向于使用const引用传递的方式,因为这个来源更新。
-
正如你自己证明的那样,因为CUDA使用了精简的C++对象模型实现,并且编译器喜欢为了性能进行内联函数扩展,所以很不可能在实际情况中找到编译器会为这两种情况生成不同代码的情况。
因此,我认为你应该使用const引用传递的版本,这样从C++语言正确性的角度来看是正确的,而且当前的文档也建议这样做。可能会有一些边缘情况,某人某时某地可能因为按值传递版本中的复制构造而受到影响,但我认为你必须非常努力才能发生这种情况。买方自负风险,等等...
英文:
A few observations first:
- The blog post you refer to is from 2017 when the feature was previewed, the documentation is current. On that basis alone you should favour the const pass-by-reference idiom because the source is newer.
- As you have proved yourself, because CUDA uses a highly stripped back implementation of the C++ object model, and the compiler loves inline function expansion for performance, it is very unlikely that you would find real world cases where the compiler would generate different code for the two cases.
As I result, I would opine that the const pass-by-reference version is what you should use, both from a C++ language correctness POV, and because the current documentation suggests you should. There are probably corner cases where someone, somewhere, sometime, got burned by copy construction in the pass-by-value version, but I suspect you would have to try very hard for that to happen. Caveat emptor and all of that….
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论