英文:
Does `std::vector`'s iterator constructor copy the data?
问题
在函数调用中,我有一个动态分配的数组,我想用它来填充一个向量。这里的上下文是,我知道我不能返回指针,因为它在返回后超出了作用域。
我的问题是关于在构建向量后调用free
函数的安全性。向量是否接管了指针的所有权,因此负责销毁,还是仅仅复制了它的数据?我的担忧是,如果我调用free
,而向量的底层数组与原始数组相同的话,那么我将会提前销毁它的内容。
文档说:
(3) 范围构造函数:使用范围[first, last)中的元素数量构造一个容器,每个元素都是从该范围中相应的元素通过其对应的构造函数构造的,顺序与范围中的顺序相同。
我搜索了“emplace constructed”,但没有找到这个术语的定义,但我觉得这是我问题的关键。
英文:
Inside a function call, I have a dynamically allocated array that I want to fill a vector with. The context here is that I know I can't return a pointer because it goes out of scope after the return.
My question is about the safety of calling free
on the created pointer once the vector has been constructed. Does the vector take ownership of the pointer, and therefore responsibility for destruction, or simply copy it's data? My concern is that if I call free
and the vector's underlying array is the same memory as the original, then I will destroy its contents early.
__host__
std::vector<float> mult(std::vector<float> x, float scalar) {
int n = x.size();
int n_threads = 256;
int n_blocks = (int)ceil(n / n_threads);
size_t bytes = n * sizeof(float);
float *d_x; // gpu "device" inputs
float *d_y; // gpu "device" outputs
float *h_y; // cpu "host" outputs
h_y = (float *)malloc(n * sizeof(float));
cudaMalloc(&d_x, bytes);
cudaMalloc(&d_y, bytes);
cudaMemcpy(d_x, x.data(), bytes, cudaMemcpyHostToDevice);
mult<<<n_blocks, n_threads>>>(n, d_x, scalar, d_y);
cudaMemcpy(h_y, d_y, bytes, cudaMemcpyDeviceToHost);
cudaFree(d_x);
cudaFree(d_y);
std::vector<float> y(h_y, h_y + sizeof(h_y));
free(h_y); <<<< CALL IN QUESTION
return y;
}
The docs say:
> (3) range constructor Constructs a container with as many elements as
> the range [first,last), with each element emplace-constructed from its
> corresponding element in that range, in the same order.
I googled "emplace constructed" but found no definition of this term, but I feel like this is the key to my question.
答案1
得分: 0
有趣的部分是:
// 步骤 1:分配内存。
float *h_y; // cpu "host" 输出
h_y = (float *)malloc(n * sizeof(float));
// 步骤 2:将数据从源复制到已分配的内存中
cudaMemcpy(h_y, d_y, bytes, cudaMemcpyDeviceToHost);
// 步骤 3:将数据从已分配的内存复制到向量
// 具有错误。
std::vector
// 步骤 4:释放已分配的内存。
free(h_y);
您的初始问题?"Step 4: free(h_y)" 会影响向量的内存吗。
简短回答:不会。
长回答:容器(如向量)管理自己的内存。因此,它会分配空间并将数据复制到这个内存中。所以你应该释放这个内存。
但是:您有一个错误:
std::vector
// sizeof(h_y) 是指针的大小(可能是 8 字节)
// 而不是已分配内存的大小。
// 所以你只复制了一个或两个浮点数到向量中。
您可能想要:
std::vector
就地构造:
这只是一种花哨的说法,意思是我们将避免调用默认构造函数,然后是复制构造函数。相反,它会确保向量中的对象通过调用构造函数并将引用传递给范围内的每个项来“就地”构造。
由于您使用的是 float
,没有实际的构造函数,因此它将直接从范围中复制每个值到向量的分配内存中。
您应该这样做:
不要分配临时缓冲区。只需确保向量的大小足够大,然后直接从 CUDA 复制到您的向量中。
// 步骤 1:分配内存(在向量中)
std::vector
// 步骤 2:将数据从源复制到已分配的内存中
cudaMemcpy(&y[0], d_y, bytes, cudaMemcpyDeviceToHost);
// 注意:&y[0] 是向量中第一个元素的地址。
// 向量是连续的。
英文:
The bit that is interesting:
// Step 1: Allocate memory.
float *h_y; // cpu "host" outputs
h_y = (float *)malloc(n * sizeof(float));
// Step 2: Copy data from source into allocated memory
cudaMemcpy(h_y, d_y, bytes, cudaMemcpyDeviceToHost);
// Step 3: Copy data from allocated memory to vector
// With a bug.
std::vector<float> y(h_y, h_y + sizeof(h_y));
// Step 4: Free Allocated memory.
free(h_y);
Your iniital question? Does "Step 4: free(h_y)" mess with the vectors memory.
Short Answer: No.
Long Answer: Containers (like vector) manage their own memory. So it allocates room and copies the data into this memory. So you should be freeing this memory.
BUT: You have a bug:
std::vector<float> y(h_y, h_y + sizeof(h_y));
// sizeof(h_y) is the size of the pointer (probably 8 bytes)
// Not the size of the allocated memory.
// So you have only copyied one or two floats into the vector.
You probably wanted:
std::vector<float> y(h_y, h_y + n);
Emplace Constructed:
This is just a fancy way of saying that we are going to avoid calling default construtor followed by copy constructor. Instead it will make sure the object in the vector are constructed "in-place" by calling the constructor passing a reference to the each item in the range.
Since you have float
there is no actual constructor so it is going to simply copy each value from the range directly into the vectors allocated memory.
What you should be doing:
Don't allocate a temporary buffer. Just make sure the vector is of sufficient size and just copy directly from CUDA into your vector.
// Step 1: Allocate memory (in the vector)
std::vector<float> y(n);
// Step 2: Copy data from source into allocated memory
cudaMemcpy(&y[0], d_y, bytes, cudaMemcpyDeviceToHost);
// Note: &y[0] is the address of the first element in the vector.
// vectors are contiguous.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论