处理CUDA中大小不确定的输出

huangapple go评论87阅读模式
英文:

How to handle an output of uncertain size in CUDA

问题

我有一个大数组,想要将其放入CUDA进行条件评估,并且希望输出满足条件的值。然而,在进行评估之前,我不知道输出值的总数。我只知道它将小于总数的30%。因此,我想知道如何编写代码以尽量减少内存使用。

英文:

I have a large array that I want to put in CUDA for some conditional evaluations, and I want to output the values that satisfy the conditions. However, before the evaluations, I don't know the total number of output values. I only know that it will be less than 30% of the total count. Therefore, I want to know how to write the code to minimize memory usage as much as possible.

答案1

得分: 1

以下是您要翻译的代码部分:

  1. #include <thrust/device_vector.h>
  2. #include <thrust/sequence.h>
  3. #include <thrust/count.h>
  4. #include <iostream>
  5. #include <cassert>
  6. struct IsEven{
  7. __host__ __device__
  8. bool operator()(int i) const{
  9. return i % 2 == 0;
  10. }
  11. };
  12. int main(){
  13. // 填充数据为0,1,2,3,4
  14. thrust::device_vector<int> d_data(5);
  15. thrust::sequence(d_data.begin(), d_data.end(), 0);
  16. // 计算偶数元素的数量
  17. int numEvenElements = thrust::count_if(d_data.begin(), d_data.end(), IsEven{});
  18. // 分配内存
  19. thrust::device_vector<int> d_evenElements(numEvenElements);
  20. // 复制偶数元素
  21. auto copyend = thrust::copy_if(d_data.begin(), d_data.end(), d_evenElements.begin(), IsEven{});
  22. assert(copyend == d_evenElements.end());
  23. for(int i = 0; i < numEvenElements; i++){
  24. std::cout << d_evenElements[i] << "\n";
  25. }
  26. }

希望这对您有所帮助。

英文:

To put my comment into code, this is how it could be done using Thrust.

  1. #include &lt;thrust/device_vector.h&gt;
  2. #include &lt;thrust/sequence.h&gt;
  3. #include &lt;thrust/count.h&gt;
  4. #include &lt;iostream&gt;
  5. #include &lt;cassert&gt;
  6. struct IsEven{
  7. __host__ __device__
  8. bool operator()(int i) const{
  9. return i % 2 == 0;
  10. }
  11. };
  12. int main(){
  13. //fill data with 0,1,2,3,4
  14. thrust::device_vector&lt;int&gt; d_data(5);
  15. thrust::sequence(d_data.begin(), d_data.end(), 0);
  16. //count number of even elements
  17. int numEvenElements = thrust::count_if(d_data.begin(), d_data.end(), IsEven{});
  18. //allocate memory
  19. thrust::device_vector&lt;int&gt; d_evenElements(numEvenElements);
  20. //copy even elements
  21. auto copyend = thrust::copy_if(d_data.begin(), d_data.end(), d_evenElements.begin(), IsEven{});
  22. assert(copyend == d_evenElements.end());
  23. for(int i = 0; i &lt; numEvenElements; i++){
  24. std::cout &lt;&lt; d_evenElements[i] &lt;&lt; &quot;\n&quot;;
  25. }
  26. }

In general, to count the number of elements satisfying a condition, you would want to create a range where for each input element you put 1 if the condition is true and 0 otherwise. Then, simply perform a parallel sum reduction over this range. With fancy iterators like a thrust::transform_iterator, the range does not need to be materialized in memory. thrust::count_if does all that for you.

There are other approaches that depend on how much memory you actually have available:

One other approach to avoid an output buffer that is not mostly empty would be to allocate 100% of the input size. Compact the selected elements into the large buffer, then copy them to the small buffer of exact size and free the large buffer. Of course, this fails if one of the buffers cannot be allocated.

A third approach would be to use the virtual memory driver API to allocate a 30% buffer, and manually release the unoccupied memory pages after compaction.

huangapple
  • 本文由 发表于 2023年5月29日 15:27:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76355408.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定