2023年6月22日 20:19:28go评论108阅读模式

英文:

Fast read/unpacking of float32 from int16 in Python

问题

我有一个Python脚本，用于读取一些以int16格式打包的二进制数据。我想尽快将这些数据转换为float32。

目前我正在这样做，对于每个文件：

data = np.fromfile(fid, 'int16').astype('float32')

这种方法不幸的是，fromfile 和 astype 用时一样长（在我的情况下需要几秒钟）。我想知道是否有更快的方法来做到这一点？

也许初始化一个零数组，然后使用np.frombuffer逐个填充两个字节？

请给予建议，谢谢。

英文:

Say I have a Python script which reads some binary data, packed as int16. I want to convert this data to float32 as fast as possible.

Currently I am doing this, per file

data = np.fromfile(fid, &#39;int16&#39;).astype(&#39;float32&#39;)

This has the unfortunate effect that the fromfile and the astype take equally long (several seconds in my case). I was wondering if there's a faster way of doing this?

Maybe initializing a zero array and using np.frombuffer to finally populate two bytes at a time?

Please advise, thanks.

答案1

得分: 1

你可以尝试另一种方法，通过逐步读取和转换数据。

这里有一个示例：

chunk_size = 1000  # 你想要读取的元素数量
file_size = os.path.getsize(file)
float32_array = np.empty(file_size // 2, dtype=np.float32)
bytes_to_read = chunk_size * 2  # 乘以2，因为int16占用2个字节
bytes_read = 0
while bytes_read < file_size:
    chunk = np.fromfile(file, dtype=np.int16, count=chunk_size)
    float32_chunk = chunk.astype(np.float32)
    float32_array[bytes_read // 2:bytes_read // 2 + chunk_size] = float32_chunk
    bytes_read += bytes_to_read

英文:

You can try an alternative approach by reading and converting the data in smaller chunks.

Here's an example :

chunk_size = 1000 # The number of element you want to read
file_size = os.path.getsize(file)
float32_array = np.empty(file_size // 2, dtype=np.float32)
bytes_to_read = chunk_size * 2  # Multiply by 2 since int16 takes 2 bytes
bytes_read = 0
while bytes_read &lt; file_size:
    chunk = np.fromfile(file, dtype=np.int16, count=chunk_size)
    float32_chunk = chunk.astype(np.float32)
    float32_array[bytes_read // 2:bytes_read // 2 + chunk_size] = float32_chunk
    bytes_read += bytes_to_read

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

快速从int16解析为float32的Python代码。

问题

答案1

在Linux中，您可以在哪里找到pip并将其添加到我的路径中？Python3.11.3

numpy数组在传递给循环时失去精度，因此无法进行比较。

如何在 Azure DevOps 中实现自动化的 Sphinx 文档生成，如果 TexLive 被冻结？

:( scourgify.py 清理短的 CSV 文件 || :| scourgify.py 清理长的 CSV 文件

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。