2023年6月8日 02:40:35go评论65阅读模式

英文:

Calling tensor.packed_accessor32() throws memory error

问题

This issue is occurring because you are trying to create packed tensor accessors for vertices inside the measure_distance_cuda function, but the tensor vertices was originally created on the CPU (torch::kCUDA was not specified when creating it). To use GPU-specific operations like packed tensor accessors, the tensor needs to be on the GPU.

You can fix this by ensuring that vertices is on the GPU before creating the accessor. Here's the modified code:

at::Tensor measure_distance_cuda(at::Tensor vertices) {
    // Move vertices to the CUDA device
    vertices = vertices.to(torch::kCUDA);

    // Rest of your code remains the same
    // ...

    // Now, you can safely create packed tensor accessors for vertices
    at::PackedTensorAccessor32<float_t, 2> vert_acc = vertices.packed_accessor32<float_t, 2>();

    // ...

    return distances;
}

By moving vertices to the CUDA device using to(torch::kCUDA), you ensure that it's compatible with GPU operations, and you should no longer encounter the error when creating the accessor.

英文:

Problem Summary

Inside my main method, I create some tensors and pass them to the function measure_distance_cuda. From there, I try to create accessors to pass to a kernel that I've written (removed for minimal working example). However, when creating the accessors using tensor.packed_accessor<>() I get the following runtime error coming from TensorBase.h:

Exception has occurred: CPP/c10::Error
Unhandled exception at 0x00007FF8071ECF19 in cuda_test.exe: Microsoft C++ exception: c10::Error at memory location 0x0000004A42CFE4F0.

What I've tried:

My first thought was that memory errors are weird and can point to the wrong line, so I removed the call to the Cuda kernel that would actually use the accessors. So no indexing is occurring whatsoever. However, the error persists.

Minimal reproducible code

My main function:

#include &lt;iostream&gt;

#include &lt;ATen/ATen.h&gt;
#include &lt;torch/types.h&gt;

#include &quot;raycast_cuda.cuh&quot;

int main() {

	auto vert_options = at::TensorOptions().dtype(torch::kFloat64).device(torch::kCUDA);

	torch::Tensor vertices = torch::tensor(
		{{-1, 1, 0},
		 {1, 1, 0},
		 {-1, -1, 0}}, vert_options
	);

    at::Tensor distances = measure_distance_cuda(vertices);
	std::cout &lt;&lt; distances &lt;&lt; std::endl;
}

raycast_cuda.cu

#include &lt;cuda.h&gt;
#include &lt;cuda_runtime.h&gt;

#include &lt;ATen/ATen.h&gt;
#include &lt;torch/types.h&gt;

__host__
at::Tensor measure_distance_cuda(at::Tensor vertices) {

    // get return tensor and accessor ****NO ERROR HERE****
    at::TensorOptions return_tensor_options = at::TensorOptions().device(torch::kCUDA);
    at::Tensor distances = at::zeros({n_rays, n_faces}, return_tensor_options);
    at::PackedTensorAccessor32&lt;float_t, 2&gt; d_acc = distances.packed_accessor32&lt;float_t, 2&gt;();

    // get accessors for inputs ****ERROR HAPPENS HERE****
    at::PackedTensorAccessor32&lt;float_t, 2&gt; vert_acc = vertices.packed_accessor32&lt;float_t, 2&gt;()

    return distances;
}

Some thoughts:

I noted that creating an accessor for the return values (distances) gives me no issues. It's only angry at me for trying it on the tensors I passed into the function. So I'm suspicious that I'm doing something in the wrong scope.

Why is this happening?

答案1

得分: 1

我在PyTorch论坛上得到了一个快速答案... 答案很简单。我将我的输入声明为kFloat64，对应于double_t而不是float_t。

auto vert_options = at::TensorOptions().dtype(torch::kFloat64).device(torch::kCUDA);

应该是

auto vert_options = at::TensorOptions().dtype(torch::kFloat32).device(torch::kCUDA);

这样我就可以调用

at::PackedTensorAccessor32<float_t, 2> vert_acc = vertices.packed_accessor32<float_t, 2>();

英文:

I got a quick answer on the PyTorch forums... The answer was simple. I'm declaring my inputs as kFloat64 which corresponds to double_t not float_t.

auto vert_options = at::TensorOptions().dtype(torch::kFloat64).device(torch::kCUDA);

should be

auto vert_options = at::TensorOptions().dtype(torch::kFloat32).device(torch::kCUDA);

so that I can call

at::PackedTensorAccessor32<float_t, 2> vert_acc = vertices.packed_accessor32<float_t, 2>();

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

调用 tensor.packed_accessor32() 会引发内存错误。

问题

Problem Summary

What I've tried:

Minimal reproducible code

答案1

如何在语言为韩文或中文时更改特定 UI（QML）的字体？

如何在PyTorch的conv2D层中指定批处理维度

Golang CGO使用大型字符指针-SEGSERV

理解C++相对于其他编程语言的性能优势

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论