如何在循环中从二进制文件中读取双精度数?

huangapple go评论99阅读模式
英文:

How do I read doubles from a binary file in a loop?

问题

以下是翻译好的部分:

简单的问题,但我在网上没有找到有帮助的答案。我有一个用C++创建的文件,我首先输出一个std::size_t k,然后写入2 * xdouble

我需要首先在Python中读取std::size_t k,然后在循环中从0k - 1迭代,在每次迭代中读取两个double x, y并对它们进行一些操作:

  1. with open('file', 'rb') as f:
  2. fig, ax = pyplot.subplots()
  3. k = numpy.fromfile(f, numpy.uint64, 1)[0] # 这里需要指定dtype和count
  4. for j in range(0, k):
  5. # 获取double x和y
  6. x = numpy.fromfile(f, numpy.float64, 1)[0] # 这里需要指定dtype和count
  7. y = numpy.fromfile(f, numpy.float64, 1)[0] # 这里需要指定dtype和count
  8. ax.scatter(x=x, y=y, c=0)
  9. ax.set_xlim(0, 1)
  10. ax.set_ylim(0, 1)

我读取的k的值是3832614067495317556,但它应该是4096。在读取x的地方,我立即遇到了索引超出范围的异常。

英文:

Simple question, but I don't find a helpful answer on the web. I have file create with C++, where I first output a std::size_t k and then write 2 * x doubles.

I need to first read the std::size_t k in python and then iterate in a loop from 0 to k - 1, read two double x, y in each iteration and do something with them:

  1. with open('file', 'r') as f:
  2. fig, ax = pyplot.subplots()
  3. k = numpy.fromfile(f, numpy.uint64)[0] # does not work
  4. for j in range(0, k):
  5. # get double x and y somehow
  6. x = numpy.fromfile(f, numpy.double)[0]
  7. y = numpy.fromfile(f, numpy.double)[0]
  8. ax.scatter(x = x, y = y, c = 0)
  9. ax.set_xlim(0, 1)
  10. ax.set_ylim(0, 1)

The value I read in k is 3832614067495317556, but it should be 4096. And at the point where I read x, I immediately get an index out of range exception.

答案1

得分: 1

你的C++代码有问题。标准的<<运算符在处理二进制数据时表现不佳。

请使用以下代码:

  1. #include <fstream>
  2. int main() {
  3. std::ofstream out("file.dat", std::ios::binary);
  4. std::size_t k = 4096;
  5. out.write(reinterpret_cast<char*>(&k), sizeof k);
  6. double a = 1.1;
  7. for (int i = 0; i < 8; ++i) {
  8. auto b = a * i;
  9. out.write(reinterpret_cast<char*>(&b), sizeof b);
  10. }
  11. }

你可以在https://en.cppreference.com/w/cpp/io/basic_ofstream的底部看到示例,这是我从那里获取的。

在二进制模式下打开文件(在Windows上是必需的),只读取1个项目的大小,并使用offset参数一次性读取所有剩余的双精度数据。还要验证std::size_t是否等于uint64,这对于相关的机器是成立的。

  1. import numpy as np
  2. with open('file.dat', 'rb') as f:
  3. fig, ax = pyplot.subplots()
  4. # 项目数,偏移字节
  5. k = np.fromfile(f, np.uint64, count=1)[0]
  6. # 可能需要将文件指针移回开头
  7. f.seek(0)
  8. xy = np.fromfile(f, np.double, offset=8)
  9. for x, y in zip(xy[::2], xy[1::2]):
  10. ax.scatter(x=x, y=y, c=0)
  11. ax.set_xlim(0, 1)
  12. ax.set_ylim(0, 1)

如果你不对散点图做任何特殊处理,只是想在一个散点图中绘制所有数据,你不需要使用for循环:

  1. import numpy as np
  2. with open('file.dat', 'rb') as f:
  3. fig, ax = pyplot.subplots()
  4. # 项目数,偏移字节
  5. k = np.fromfile(f, np.uint64, count=1)[0]
  6. # 可能需要将文件指针移回开头
  7. f.seek(0)
  8. xy = np.fromfile(f, np.double, offset=8)
  9. ax.scatter(x=xy[::2], y=xy[1::2], c=np.zeros(len(xy) // 2))
  10. ax.set_xlim(0, 1)
  11. ax.set_ylim(0, 1)
英文:

Your C++ code is wrong. The standard &lt;&lt; operator doesn't behave well with binary data.

Use the following:

  1. #include &lt;fstream&gt;
  2. int
  3. main()
  4. std::ofstream out(&quot;file.dat&quot;, std::ios::binary);
  5. std::size_t k = 4096;
  6. out.write(reinterpret_cast&lt;char*&gt;(&amp;k), sizeof k);
  7. double a = 1.1;
  8. for (int i = 0; i &lt; 8; ++i) {
  9. auto b = a*i;
  10. out.write(reinterpret_cast&lt;char*&gt;(&amp;b), sizeof b);
  11. }
  12. }

See the example at the bottom of https://en.cppreference.com/w/cpp/io/basic_ofstream, where I grabbed this from.


(Answer related to improvements and mistakes of the Python code, even if that's not the essential problem.)

Open the file in binary mode (necessary on Windows), read just 1 item for the size, and read all the remaining doubles in one go using the offset parameter. Also verify that std::size_t is equal to uint64 for the relevant machine(s).

  1. import numpy as np
  2. with open(&#39;file.dat&#39;, &#39;rb&#39;) as f:
  3. fig, ax = pyplot.subplots()
  4. # count in items, offset in bytes
  5. k = np.fromfile(f, np.uint64, count=1)[0]
  6. # Might need to move the file pointer back to the start
  7. f.seek(0)
  8. xy = np.fromfile(f, np.double, offset=8)
  9. for x, y in zip(xy[::2], xy[1::2]):
  10. ax.scatter(x = x, y = y, c = 0)
  11. ax.set_xlim(0, 1)
  12. ax.set_ylim(0, 1)

If you're not doing anything special with the scatter plots, and just want to plot all data in one scatter plot, you don't need a for-loop:

  1. import numpy as np
  2. with open(&#39;file.dat&#39;, &#39;rb&#39;) as f:
  3. fig, ax = pyplot.subplots()
  4. # count in items, offset in bytes
  5. k = np.fromfile(f, np.uint64, count=1)[0]
  6. # Might need to move the file pointer back to the start
  7. f.seek(0)
  8. xy = np.fromfile(f, np.double, offset=8)
  9. ax.scatter(x = xy[::2], y = xy[1::2], c = np.zeros(len(xy)/2))
  10. ax.set_xlim(0, 1)
  11. ax.set_ylim(0, 1)

huangapple
  • 本文由 发表于 2023年5月11日 19:56:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76227399.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定