英文:
How to handle an output of uncertain size in CUDA
问题
我有一个大数组,想要将其放入CUDA进行条件评估,并且希望输出满足条件的值。然而,在进行评估之前,我不知道输出值的总数。我只知道它将小于总数的30%。因此,我想知道如何编写代码以尽量减少内存使用。
英文:
I have a large array that I want to put in CUDA for some conditional evaluations, and I want to output the values that satisfy the conditions. However, before the evaluations, I don't know the total number of output values. I only know that it will be less than 30% of the total count. Therefore, I want to know how to write the code to minimize memory usage as much as possible.
答案1
得分: 1
以下是您要翻译的代码部分:
#include <thrust/device_vector.h>
#include <thrust/sequence.h>
#include <thrust/count.h>
#include <iostream>
#include <cassert>
struct IsEven{
__host__ __device__
bool operator()(int i) const{
return i % 2 == 0;
}
};
int main(){
// 填充数据为0,1,2,3,4
thrust::device_vector<int> d_data(5);
thrust::sequence(d_data.begin(), d_data.end(), 0);
// 计算偶数元素的数量
int numEvenElements = thrust::count_if(d_data.begin(), d_data.end(), IsEven{});
// 分配内存
thrust::device_vector<int> d_evenElements(numEvenElements);
// 复制偶数元素
auto copyend = thrust::copy_if(d_data.begin(), d_data.end(), d_evenElements.begin(), IsEven{});
assert(copyend == d_evenElements.end());
for(int i = 0; i < numEvenElements; i++){
std::cout << d_evenElements[i] << "\n";
}
}
希望这对您有所帮助。
英文:
To put my comment into code, this is how it could be done using Thrust.
#include <thrust/device_vector.h>
#include <thrust/sequence.h>
#include <thrust/count.h>
#include <iostream>
#include <cassert>
struct IsEven{
__host__ __device__
bool operator()(int i) const{
return i % 2 == 0;
}
};
int main(){
//fill data with 0,1,2,3,4
thrust::device_vector<int> d_data(5);
thrust::sequence(d_data.begin(), d_data.end(), 0);
//count number of even elements
int numEvenElements = thrust::count_if(d_data.begin(), d_data.end(), IsEven{});
//allocate memory
thrust::device_vector<int> d_evenElements(numEvenElements);
//copy even elements
auto copyend = thrust::copy_if(d_data.begin(), d_data.end(), d_evenElements.begin(), IsEven{});
assert(copyend == d_evenElements.end());
for(int i = 0; i < numEvenElements; i++){
std::cout << d_evenElements[i] << "\n";
}
}
In general, to count the number of elements satisfying a condition, you would want to create a range where for each input element you put 1
if the condition is true and 0
otherwise. Then, simply perform a parallel sum reduction over this range. With fancy iterators like a thrust::transform_iterator
, the range does not need to be materialized in memory. thrust::count_if
does all that for you.
There are other approaches that depend on how much memory you actually have available:
One other approach to avoid an output buffer that is not mostly empty would be to allocate 100% of the input size. Compact the selected elements into the large buffer, then copy them to the small buffer of exact size and free the large buffer. Of course, this fails if one of the buffers cannot be allocated.
A third approach would be to use the virtual memory driver API to allocate a 30% buffer, and manually release the unoccupied memory pages after compaction.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论