2023年5月14日 18:27:25go评论101阅读模式

英文:

C++ Load bin file to tensor SegFault

问题

I understand your request. Here is the translated code:

我在PyTorch中有一个张量，并试图将其移植到C++的LibTorch。
我创建了一个孤立的示例来演示问题。
*用于导出张量的Python代码*
```Python
# 生成从0到1000000的值范围
values = torch.arange(1000000, dtype=torch.float32)
# 将值重塑为1000x1000的张量
tensor = values.reshape(1000, 1000)
def export_to_binary(tensor, file_path):
    # 将张量转换为NumPy数组
    arr = np.array(tensor)
    # 将数组写入二进制文件
    with open(file_path, 'wb') as f:
        arr.tofile(f)
export_to_binary(tensor, 'tensor.bin')

在C++中，我有一个Foo类，其中包含bar_和baz_私有成员。

foo.h

#ifndef FOO_H
#define FOO_H
#include <torch/torch.h>
class Foo
{
public:
  Foo();
private:
  torch::Tensor bar_;
  torch::Tensor baz_;
};
#endif // FOO_H

在构造函数的定义中，我尝试从tensor.bin文件中加载内容，并将其填充到_baz中。

foo.cc

#define MATRIX_SIZE 1000
torch::Tensor LoadFromBinary(const std::string &file_path)
{
  // 打开二进制文件
  std::ifstream file(file_path, std::ios::binary);
  if (!file)
  {
    throw std::runtime_error("无法打开文件：" + file_path);
  }
  // 确定文件大小
  file.seekg(0, std::ios::end);
  std::streampos file_size = file.tellg();
  file.seekg(0, std::ios::beg);
  // 检查文件大小是否与预期张量大小匹配
  const std::size_t expected_size = MATRIX_SIZE * MATRIX_SIZE * sizeof(float);
  if (file_size != static_cast<std::streampos>(expected_size))
  {
    throw std::runtime_error("文件大小不匹配：" + file_path);
  }
  // 将文件内容读入向量
  std::vector<float> data(MATRIX_SIZE * MATRIX_SIZE);
  file.read(reinterpret_cast<char *>(data.data()), expected_size);
  // 将向量转换为张量
  torch::Tensor tensor = torch::from_blob(data.data(), {MATRIX_SIZE, MATRIX_SIZE});
  return tensor;
}
Foo::Foo()
{
  baz_ = torch::zeros({MATRIX_SIZE, MATRIX_SIZE});
  baz_ = LoadFromBinary("./tensor.bin");
  std::cout << "baz_ " << baz_[1][798] << std::endl; // SegFault
}

我通过一个简单的gtest运行它（只有Foo foo;），但它报错"异常：SegFault"。
然而，我发现了一件有趣的事情：如果我在加载到baz_之前将相同的bin文件加载到bar_中，然后我可以访问baz_，但只能访问baz_。

Foo::Foo()

  bar_ = torch::zeros({MATRIX_SIZE, MATRIX_SIZE});
  baz_ = torch::zeros({MATRIX_SIZE, MATRIX_SIZE});
  bar_ = LoadFromBinary("./tensor.bin");
  baz_ = LoadFromBinary("./tensor.bin");
  std::cout << "baz_ " << baz_[1][798] << std::endl;

baz_返回正确的值，但无法访问bar_，会导致SegFault。

如果我改变顺序，情况也是一样的。看起来至少需要2次加载，而且总是第2个成员可以访问。


<details>
<summary>英文:</summary>
I have a tenor in pytorch and I am trying to port it to c++ libtorch.
I made an isolated example to demonstrate the problem.
*The python code to export the tensor*
```Python
# Generate a range of values from 0 to 1000000
values = torch.arange(1000000, dtype=torch.float32)
# Reshape the values into a 1000x1000 tensor
tensor = values.reshape(1000, 1000)
def export_to_binary(tensor, file_path):
    # Convert tensor to NumPy array
    arr = np.array(tensor)
    # Write array to binary file
    with open(file_path, &#39;wb&#39;) as f:
        arr.tofile(f)
export_to_binary(tensor, &#39;tensor.bin&#39;)

In C++ I have Foo class with bar_ and baz_ private members.

foo.h

#ifndef FOO_H
#define FOO_H
#include &lt;torch/torch.h&gt;
class Foo
{
public:
  Foo();
private:
  torch::Tensor bar_;
  torch::Tensor baz_;
};
#endif // FOO_H

In the definition of the constructor I try to load the content the tensor.bin file, and populate _baz from it.

foo.cc

#define MATRIX_SIZE 1000
torch::Tensor LoadFromBinary(const std::string &amp;file_path)
{
  // Open binary file
  std::ifstream file(file_path, std::ios::binary);
  if (!file)
  {
    throw std::runtime_error(&quot;Failed to open file: &quot; + file_path);
  }
  // Determine file size
  file.seekg(0, std::ios::end);
  std::streampos file_size = file.tellg();
  file.seekg(0, std::ios::beg);
  // Check if file size matches the expected tensor size
  const std::size_t expected_size = MATRIX_SIZE * MATRIX_SIZE * sizeof(float);
  if (file_size != static_cast&lt;std::streampos&gt;(expected_size))
  {
    throw std::runtime_error(&quot;File size mismatch: &quot; + file_path);
  }
  // Read file contents into vector
  std::vector&lt;float&gt; data(MATRIX_SIZE * MATRIX_SIZE);
  file.read(reinterpret_cast&lt;char *&gt;(data.data()), expected_size);
  // Convert vector to tensor
  torch::Tensor tensor = torch::from_blob(data.data(), {MATRIX_SIZE, MATRIX_SIZE});
  return tensor;
}
Foo::Foo()
{
  baz_ = torch::zeros({MATRIX_SIZE, MATRIX_SIZE});
  baz_ = LoadFromBinary(&quot;./tensor.bin&quot;);
  std::cout &lt;&lt; &quot;baz_ &quot; &lt;&lt; baz_[1][798] &lt;&lt; std::endl; //SegFault
}

I run it through a simple gtest (Just Foo foo;) but it gives "Exception: SegFault".
However I found an interesting thing: If I load to bar_ the same bin file, before loading to baz_, then I can access baz_, but only baz_.

Foo::Foo()

  bar_ = torch::zeros({MATRIX_SIZE, MATRIX_SIZE});
  baz_ = torch::zeros({MATRIX_SIZE, MATRIX_SIZE});
  bar_ = LoadFromBinary(&quot;./tensor.bin&quot;);
  baz_ = LoadFromBinary(&quot;./tensor.bin&quot;);
  std::cout &lt;&lt; &quot;baz_ &quot; &lt;&lt; baz_[1][798] &lt;&lt; std::endl;

baz_ gives back the correct values, but accessing bar_ is not possible, gives SegFault.

If I change the order, the same happens. It looks like at least 2 loads are necessary an alway the 2nd member is accessible.

答案1

得分: 1

存在使用 torch::from_blob 的问题：该函数创建的张量不拥有底层数据的所有权。在您的示例中，一旦退出 LoadFromBinary 函数的作用域，vector<float> 以及其中包含的数据将被删除，因此您的张量指向未分配的内存。对于 bar 而言可以工作，而对于 baz 而言不行，似乎是未定义的行为。
修复应该是 return tensor.clone()，我认为。

英文:

There is an issue with your use of torch::from_blob : this function creates a tensor which does not have ownership of the underlying data. In your example, as soon as you exit the scope of the LoadFromBinary function, the vector<float> is deleted along with the data it contains and thus your tensor points toward unallocated memory. The weird case where it works for bar and not for baz seems like undefined behavior.
The fix should be to return tensor.clone() I think

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

C++加载bin文件到张量SegFault

问题

答案1

如何在QGraphicsItem的可移动区域周围绘制一个边界矩形？

使用键盘在带有 QTreeView 的 QComboBox 中控制所选项目

默认函数参数和单一定义规则

When compiling a Rust library with C++ extensions in debug mode, is the C++ code compiled with debug flags too?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。