英文:
Calling tensor.packed_accessor32() throws memory error
问题
This issue is occurring because you are trying to create packed tensor accessors for vertices
inside the measure_distance_cuda
function, but the tensor vertices
was originally created on the CPU (torch::kCUDA
was not specified when creating it). To use GPU-specific operations like packed tensor accessors, the tensor needs to be on the GPU.
You can fix this by ensuring that vertices
is on the GPU before creating the accessor. Here's the modified code:
at::Tensor measure_distance_cuda(at::Tensor vertices) {
// Move vertices to the CUDA device
vertices = vertices.to(torch::kCUDA);
// Rest of your code remains the same
// ...
// Now, you can safely create packed tensor accessors for vertices
at::PackedTensorAccessor32<float_t, 2> vert_acc = vertices.packed_accessor32<float_t, 2>();
// ...
return distances;
}
By moving vertices
to the CUDA device using to(torch::kCUDA)
, you ensure that it's compatible with GPU operations, and you should no longer encounter the error when creating the accessor.
英文:
Problem Summary
Inside my main method, I create some tensors and pass them to the function measure_distance_cuda
. From there, I try to create accessors to pass to a kernel that I've written (removed for minimal working example). However, when creating the accessors using tensor.packed_accessor<>()
I get the following runtime error coming from TensorBase.h:
Exception has occurred: CPP/c10::Error
Unhandled exception at 0x00007FF8071ECF19 in cuda_test.exe: Microsoft C++ exception: c10::Error at memory location 0x0000004A42CFE4F0.
What I've tried:
My first thought was that memory errors are weird and can point to the wrong line, so I removed the call to the Cuda kernel that would actually use the accessors. So no indexing is occurring whatsoever. However, the error persists.
Minimal reproducible code
My main function:
#include <iostream>
#include <ATen/ATen.h>
#include <torch/types.h>
#include "raycast_cuda.cuh"
int main() {
auto vert_options = at::TensorOptions().dtype(torch::kFloat64).device(torch::kCUDA);
torch::Tensor vertices = torch::tensor(
{{-1, 1, 0},
{1, 1, 0},
{-1, -1, 0}}, vert_options
);
at::Tensor distances = measure_distance_cuda(vertices);
std::cout << distances << std::endl;
}
raycast_cuda.cu
#include <cuda.h>
#include <cuda_runtime.h>
#include <ATen/ATen.h>
#include <torch/types.h>
__host__
at::Tensor measure_distance_cuda(at::Tensor vertices) {
// get return tensor and accessor ****NO ERROR HERE****
at::TensorOptions return_tensor_options = at::TensorOptions().device(torch::kCUDA);
at::Tensor distances = at::zeros({n_rays, n_faces}, return_tensor_options);
at::PackedTensorAccessor32<float_t, 2> d_acc = distances.packed_accessor32<float_t, 2>();
// get accessors for inputs ****ERROR HAPPENS HERE****
at::PackedTensorAccessor32<float_t, 2> vert_acc = vertices.packed_accessor32<float_t, 2>()
return distances;
}
Some thoughts:
- I noted that creating an accessor for the return values (
distances
) gives me no issues. It's only angry at me for trying it on the tensors I passed into the function. So I'm suspicious that I'm doing something in the wrong scope.
Why is this happening?
答案1
得分: 1
我在PyTorch论坛上得到了一个快速答案... 答案很简单。我将我的输入声明为kFloat64
,对应于double_t
而不是float_t
。
auto vert_options = at::TensorOptions().dtype(torch::kFloat64).device(torch::kCUDA);
应该是
auto vert_options = at::TensorOptions().dtype(torch::kFloat32).device(torch::kCUDA);
这样我就可以调用
at::PackedTensorAccessor32<float_t, 2> vert_acc = vertices.packed_accessor32<float_t, 2>();
英文:
I got a quick answer on the PyTorch forums... The answer was simple. I'm declaring my inputs as kFloat64
which corresponds to double_t
not float_t
.
auto vert_options = at::TensorOptions().dtype(torch::kFloat64).device(torch::kCUDA);
should be
auto vert_options = at::TensorOptions().dtype(torch::kFloat32).device(torch::kCUDA);
so that I can call
at::PackedTensorAccessor32<float_t, 2> vert_acc = vertices.packed_accessor32<float_t, 2>();
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论