`std::vector`的迭代器构造函数会复制数据吗?

huangapple go评论72阅读模式
英文:

Does `std::vector`'s iterator constructor copy the data?

问题

在函数调用中,我有一个动态分配的数组,我想用它来填充一个向量。这里的上下文是,我知道我不能返回指针,因为它在返回后超出了作用域。

我的问题是关于在构建向量后调用free函数的安全性。向量是否接管了指针的所有权,因此负责销毁,还是仅仅复制了它的数据?我的担忧是,如果我调用free,而向量的底层数组与原始数组相同的话,那么我将会提前销毁它的内容。

文档说:

(3) 范围构造函数:使用范围[first, last)中的元素数量构造一个容器,每个元素都是从该范围中相应的元素通过其对应的构造函数构造的,顺序与范围中的顺序相同。

我搜索了“emplace constructed”,但没有找到这个术语的定义,但我觉得这是我问题的关键。

英文:

Inside a function call, I have a dynamically allocated array that I want to fill a vector with. The context here is that I know I can't return a pointer because it goes out of scope after the return.

My question is about the safety of calling free on the created pointer once the vector has been constructed. Does the vector take ownership of the pointer, and therefore responsibility for destruction, or simply copy it's data? My concern is that if I call free and the vector's underlying array is the same memory as the original, then I will destroy its contents early.

__host__ 
std::vector<float> mult(std::vector<float> x, float scalar) {

    int n = x.size();

    int n_threads = 256;
    int n_blocks = (int)ceil(n / n_threads);
    size_t bytes = n * sizeof(float);    

    float *d_x; // gpu "device" inputs
    float *d_y; // gpu "device" outputs
    float *h_y; // cpu "host" outputs
    
    h_y = (float *)malloc(n * sizeof(float));
    cudaMalloc(&d_x, bytes);
    cudaMalloc(&d_y, bytes);

    cudaMemcpy(d_x, x.data(), bytes, cudaMemcpyHostToDevice);
    mult<<<n_blocks, n_threads>>>(n, d_x, scalar, d_y);
    cudaMemcpy(h_y, d_y, bytes, cudaMemcpyDeviceToHost);

    cudaFree(d_x);
    cudaFree(d_y);

    std::vector<float> y(h_y, h_y + sizeof(h_y));

    free(h_y); <<<< CALL IN QUESTION
    return y;
}

The docs say:

> (3) range constructor Constructs a container with as many elements as
> the range [first,last), with each element emplace-constructed from its
> corresponding element in that range, in the same order.

I googled "emplace constructed" but found no definition of this term, but I feel like this is the key to my question.

答案1

得分: 0

有趣的部分是:

// 步骤 1:分配内存。
float *h_y; // cpu "host" 输出
h_y = (float *)malloc(n * sizeof(float));

// 步骤 2:将数据从源复制到已分配的内存中
cudaMemcpy(h_y, d_y, bytes, cudaMemcpyDeviceToHost);

// 步骤 3:将数据从已分配的内存复制到向量
// 具有错误。
std::vector y(h_y, h_y + sizeof(h_y));

// 步骤 4:释放已分配的内存。
free(h_y);

您的初始问题?"Step 4: free(h_y)" 会影响向量的内存吗。

简短回答:不会。
长回答:容器(如向量)管理自己的内存。因此,它会分配空间并将数据复制到这个内存中。所以你应该释放这个内存。

但是:您有一个错误:

std::vector y(h_y, h_y + sizeof(h_y));
// sizeof(h_y) 是指针的大小(可能是 8 字节)
// 而不是已分配内存的大小。
// 所以你只复制了一个或两个浮点数到向量中。

您可能想要:

std::vector y(h_y, h_y + n);

就地构造:

这只是一种花哨的说法,意思是我们将避免调用默认构造函数,然后是复制构造函数。相反,它会确保向量中的对象通过调用构造函数并将引用传递给范围内的每个项来“就地”构造。

由于您使用的是 float,没有实际的构造函数,因此它将直接从范围中复制每个值到向量的分配内存中。

您应该这样做:

不要分配临时缓冲区。只需确保向量的大小足够大,然后直接从 CUDA 复制到您的向量中。

// 步骤 1:分配内存(在向量中)
std::vector y(n);

// 步骤 2:将数据从源复制到已分配的内存中
cudaMemcpy(&y[0], d_y, bytes, cudaMemcpyDeviceToHost);

// 注意:&y[0] 是向量中第一个元素的地址。
// 向量是连续的。

英文:

The bit that is interesting:

    // Step 1: Allocate memory.
    float *h_y; // cpu "host" outputs
    h_y = (float *)malloc(n * sizeof(float));


    // Step 2: Copy data from source into allocated memory
    cudaMemcpy(h_y, d_y, bytes, cudaMemcpyDeviceToHost);

    // Step 3: Copy data from allocated memory to vector
    //         With a bug.
    std::vector<float> y(h_y, h_y + sizeof(h_y));

    // Step 4: Free Allocated memory.
    free(h_y);

Your iniital question? Does "Step 4: free(h_y)" mess with the vectors memory.

Short Answer: No.
Long Answer: Containers (like vector) manage their own memory. So it allocates room and copies the data into this memory. So you should be freeing this memory.

BUT: You have a bug:

std::vector<float> y(h_y, h_y + sizeof(h_y));
// sizeof(h_y) is the size of the pointer (probably 8 bytes)
// Not the size of the allocated memory.
// So you have only copyied one or two floats into the vector.

You probably wanted:

std::vector<float> y(h_y, h_y + n);

Emplace Constructed:

This is just a fancy way of saying that we are going to avoid calling default construtor followed by copy constructor. Instead it will make sure the object in the vector are constructed "in-place" by calling the constructor passing a reference to the each item in the range.

Since you have float there is no actual constructor so it is going to simply copy each value from the range directly into the vectors allocated memory.

What you should be doing:

Don't allocate a temporary buffer. Just make sure the vector is of sufficient size and just copy directly from CUDA into your vector.

    // Step 1: Allocate memory (in the vector)
    std::vector<float> y(n);

    // Step 2: Copy data from source into allocated memory
    cudaMemcpy(&y[0], d_y, bytes, cudaMemcpyDeviceToHost);

    // Note: &y[0] is the address of the first element in the vector.
    //             vectors are contiguous.

huangapple
  • 本文由 发表于 2023年6月30日 01:04:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76583194.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定