英文:
storing matrices in golang in compressed binary format
问题
我正在探索Go
和Python
之间的比较,特别是在数学计算方面。我注意到Go
有一个矩阵包mat64。
-
我想问一下同时使用
Go
和Python
的人,是否有与Numpy
的savez_compressed
相当的函数/工具,可以将数据存储在npz
格式(即"压缩"二进制,一个文件中包含多个矩阵)中,用于Go
的矩阵? -
另外,
Go
的矩阵能够处理像Numpy
那样的字符串类型吗?
英文:
I am exploring a comparison between Go
and Python
, particularly for mathematical computation. I noticed that Go
has a matrix package mat64.
-
I wanted to ask someone who uses both
Go
andPython
if there are functions / tools comparable that are equivalent ofNumpy
'ssavez_compressed
which stores data in anpz
format (i.e. "compressed" binary, multiple matrices per file) forGo
's matrics? -
Also, can Go's matrices handle string types like
Numpy
does?
答案1
得分: 2
- .npz是numpy特定的格式。很不可能Go本身会在标准库中支持这种格式。我也不知道今天是否存在任何第三方库,而且(10秒钟)的搜索也没有找到一个。如果你需要特定的npz格式,可以使用Python + numpy。
如果你只是想在Go中使用类似的东西,你可以使用任何格式。二进制格式包括golang binary和gob。根据你想要做什么,你甚至可以使用非二进制格式,比如json,然后自己进行压缩。
- Go没有内置的矩阵。你找到的那个库是第三方库,它只处理
float64
类型的数据。
然而,如果你只需要以矩阵(n维)格式存储字符串,你可以使用n维切片。对于二维切片,它的声明如下:var myStringMatrix [][]string
。
英文:
- .npz is a numpy specific format. It is unlikely that Go itself would ever support this format in the standard library. I also don't know of any third party library that exists today, and (10 second) search didn't pop one up. If you need npz specifically, go with python + numpy.
If you just want something similar from Go, you can use any format. Binary formats include golang binary and gob. Depending on what you're trying to do, you could even use a non-binary format like json and just compress it on your own.
- Go doesn't have built-in matrices. That library you found is third party and it only handles
float64
s.
However, if you just need to store strings in matrix (n-dimensional) format, you would use a n-dimensional slice. For 2-dimensional it looks like this: var myStringMatrix [][]string
.
答案2
得分: 1
npz
文件是zip
归档文件。归档和压缩(可选)由Python的zip
模块处理。npz
文件包含了每个保存的变量的一个npy
文件。任何基于操作系统的归档工具都可以解压缩和提取组件.npy
文件。
所以剩下的问题是 - 你能模拟npy
格式吗?这并不是微不足道的,但也不难。它由一个包含形状、步幅、数据类型和顺序信息的头块组成,后面是一个数据块,实际上是数组的数据缓冲区的字节图像。
因此,缓冲区信息和数据与numpy
数组内容密切相关。如果变量不是普通数组,save
函数会使用Python的pickle机制。
首先,我建议使用csv
格式。它不是二进制格式,也不快,但每个人都可以生成和读取它。我们经常收到关于使用np.loadtxt
或np.genfromtxt
读取此类文件的问题。查看np.savetxt
的代码,了解numpy
如何生成这种文件。它非常简单。
另一个通用选择是使用数组的tolist
格式的JSON。之所以想到这个,是因为GO
是谷歌为Web应用程序开发的自家替代Python的语言。JSON是一种基于简化的JavaScript语法的跨语言格式。
英文:
npz
files are zip
archives. Archiving and compression (optional) are handled by the Python zip
module. The npz
contains one npy
file for each variable that you save. Any OS based archiving tool can decompress and extract the component .npy
files.
So the remaining question is - can you simulate the npy
format? It isn't trivial, but also not difficult either. It consists of a header block that contains shape, strides, dtype, and order information, followed by a data block, which is, effectively, a byte image of the data buffer of the array.
So the buffer information, and data are closely linked to the numpy
array content. And if the variable isn't a normal array, save
uses the Python pickle mechanism.
For a start I'd suggest using the csv
format. It's not binary, and not fast, but everyone and his brother can generate and read it. We constantly get SO questions about reading such files using np.loadtxt
or np.genfromtxt
. Look at the code for np.savetxt
to see how numpy
produces such files. It's pretty simple.
Another general purpose choice would be JSON using the tolist
format of an array. That comes to mind because GO
is Google's home grown alternative to Python for web applications. JSON is a cross language format based on simplified Javascript syntax.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论