2023年5月17日 19:15:36go评论80阅读模式

英文:

When using thrust, is it legitimate to create a std::array inside a __host__ __device__ functor?

问题

I wrote a toy code to test some ideas

#include &lt;thrust/transform.h&gt;
#include &lt;thrust/device_vector.h&gt;
#include &lt;thrust/host_vector.h&gt;
#include &lt;thrust/reduce.h&gt;
#include &lt;thrust/iterator/counting_iterator.h&gt;
#include &lt;thrust/execution_policy.h&gt;
#include &lt;iostream&gt;
#include &lt;array&gt;
#include &lt;vector&gt;

#define N 20

struct func {
        __host__ __device__
        float operator()(float x) { return x*2; }
};

template &lt;typename S&gt;
struct O {
        const std::array&lt;float,2&gt; a;
        O(std::array&lt;float,2&gt; a): a(a) {}

        S f;
        __host__ __device__
        float operator()(float &amp;v) {
                std::array&lt;int,3&gt; b = {2,3,4};
                int tmp;
                for (int i=0; i&lt;3; i++) {
                        tmp = thrust::reduce(thrust::device,b.begin(),b.end(),0);
                        printf(&quot;%d&quot;,tmp);
                }
                return a[0]*v + a[1] + f(a[0]);
        }
};

int main(void) {

        thrust::host_vector&lt;float&gt; _v1(N);
        thrust::device_vector&lt;float&gt; v1 = _v1, v2;
        thrust::fill(v1.begin(),v1.end(),12);
        v2.resize(N);

        std::array&lt;float,2&gt; a{1,2};
        auto c_itor = thrust::make_counting_iterator(0);
        thrust::transform(v1.begin(),v1.end(),v2.begin(),O&lt;func&gt;(a));

        thrust::copy(v2.begin(),v2.end(),std::ostream_iterator&lt;float&gt;(std::cout,&quot; &quot;));

}

这段代码在使用 nvcc --expt-relaxed-constexpr -std=c++17 时运行良好。可以看到在 __host__ __device__ 函数中有很多 std 容器，比如 std::array。您想知道：

这样的编写方式是否合理？（从效率角度，不是语法有效性）
由于代码运行正确，std 对象存储在哪里？（设备还是主机）

我只提供了代码的翻译，没有回答您的问题。

英文:

I wrote a toy code to test some ideas

#include &lt;thrust/transform.h&gt;
#include &lt;thrust/device_vector.h&gt;
#include &lt;thrust/host_vector.h&gt;
#include &lt;thrust/reduce.h&gt;
#include &lt;thrust/iterator/counting_iterator.h&gt;
#include &lt;thrust/execution_policy.h&gt;
#include &lt;iostream&gt;
#include &lt;array&gt;
#include &lt;vector&gt;
#define N 20
struct func {
__host__ __device__
float operator()(float x) { return x*2; }
};
template &lt;typename S&gt;
struct O {
const std::array&lt;float,2&gt; a;
O(std::array&lt;float,2&gt; a): a(a) {}
S f;
__host__ __device__
float operator()(float &amp;v) {
std::array&lt;int,3&gt; b = {2,3,4};
int tmp;
for (int i=0; i&lt;3; i++) {
tmp = thrust::reduce(thrust::device,b.begin(),b.end(),0);
printf(&quot;%d&quot;,tmp);
}
return a[0]*v + a[1] + f(a[0]);
}
};
int main(void) {
thrust::host_vector&lt;float&gt; _v1(N);
thrust::device_vector&lt;float&gt; v1 = _v1, v2;
thrust::fill(v1.begin(),v1.end(),12);
v2.resize(N);
std::array&lt;float,2&gt; a{1,2};
auto c_itor = thrust::make_counting_iterator(0);
thrust::transform(v1.begin(),v1.end(),v2.begin(),O&lt;func&gt;(a));
thrust::copy(v2.begin(),v2.end(),std::ostream_iterator&lt;float&gt;(std::cout,&quot; &quot;));
}

This code runs perfectly when using nvcc --expt-relaxed-constexpr -std=c++17. One can see that there are a lot of std containers like std::array occur in a __host__ __device__ functor, what I want to know is

is this writing legitimate? (in term of efficiency, not grammar validity)
since the code runs correctly, where do the std objects store? (device or host)

答案1

得分: 2

以下是翻译好的部分：

使用C++17或更高版本以及--expt-relaxed-constexpr的std::array的特殊情况有效，因为std::array只是一个薄包装，包裹着C风格数组，而在C++17中，您使用的所有成员函数都是constexpr。我认为除了std::array::fill和std::array::swap之外的所有成员函数都在C++17中是constexpr的。这两个在C++20中也变成了constexpr。

因此，就性能考虑而言，您的代码应该与使用float a[2]和int b[3]时的性能相同。这意味着如果可能的话，值将存储在寄存器中（这取决于对b的循环展开以及一般的寄存器压力）。只要不过度使用数组的大小，这是可以接受的。例如，参见此答案以深入讨论数组、寄存器和本地内存。

其他容器/替代方案：

对于使用动态内存的其他STL容器，您可能不会像成员函数是constexpr方面那么幸运。HPC nvc++编译器（以前是PGI C++编译器）不需要__device__标记，因此理论上可以在设备代码中使用更多STL功能，但在大多数情况下，这在性能上是一个坏主意。STL函数仍然必须符合CUDA的C++语言限制。

Nvidia正在开发自己的C++标准库实现，带有自己的设备扩展，名为libcu++。目前还没有容器，但它们可能会在将来出现。对于哈希表，有cuCollections库（正在进行中）。

英文:

The special case of using std::array with C++17 or higher and --expt-relaxed-constexpr works because std::array is a very thin wrapper around a C-style array and with C++17 all member functions that you used are constexpr. I think all member functions but std::array::fill and std::array::swap are constexpr by C++17. These two got the constexpr treatment with C++20.

So for performance considerations your code should perform the same as when using float a[2] and int b[3]. This means that the values are stored in registers if possible (this depends on loop-unrolling for b and generally register pressure). This is fine as long as you don't go overboard with the size of the arrays. See e.g. this answer for a deeper discussion of arrays, registers and local memory.

Other Containers / Alternatives:

For other STL containers using dynamic memory you probably wont be as lucky in terms of member functions being constexpr. The HPC nvc++ compiler (former PGI C++ compiler) does not need __device__ markers, so in theory one can use a lot more STL functionality in device code but in most cases that is a bad idea in terms of performance. STL functions must also still conform to CUDA's C++ Language Restrictions.

Nvidia is developing its own C++ standard library implementation with its own device extensions in libcu++. There are no containers yet, but they might come in the future. For hash tables there is the cuCollections library (WIP).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

When using thrust, is it legitimate to create a std::array inside a host device functor?

问题

答案1

其他容器/替代方案：

Other Containers / Alternatives:

一个累积其元素指标的容器

`std::this_thread::yield()`比`std::this_thread::sleep_for(0s)`慢10倍的原因是什么？

当使用不同的字段值时，Protobuf序列化长度会有所不同。

为什么在GCC和MSVC（C或C++）中，宏模拟的函数默认参数展开方式不同？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论