在Golang中以压缩的二进制格式存储矩阵。

huangapple go评论118阅读模式
英文:

storing matrices in golang in compressed binary format

问题

我正在探索GoPython之间的比较,特别是在数学计算方面。我注意到Go有一个矩阵包mat64

  1. 我想问一下同时使用GoPython的人,是否有与Numpysavez_compressed相当的函数/工具,可以将数据存储在npz格式(即"压缩"二进制,一个文件中包含多个矩阵)中,用于Go的矩阵?

  2. 另外,Go的矩阵能够处理像Numpy那样的字符串类型吗?

英文:

I am exploring a comparison between Go and Python, particularly for mathematical computation. I noticed that Go has a matrix package mat64.

  1. I wanted to ask someone who uses both Go and Python if there are functions / tools comparable that are equivalent of Numpy's savez_compressed which stores data in a npz format (i.e. "compressed" binary, multiple matrices per file) for Go's matrics?

  2. Also, can Go's matrices handle string types like Numpy does?

答案1

得分: 2

  1. .npz是numpy特定的格式。很不可能Go本身会在标准库中支持这种格式。我也不知道今天是否存在任何第三方库,而且(10秒钟)的搜索也没有找到一个。如果你需要特定的npz格式,可以使用Python + numpy。

如果你只是想在Go中使用类似的东西,你可以使用任何格式。二进制格式包括golang binarygob。根据你想要做什么,你甚至可以使用非二进制格式,比如json,然后自己进行压缩。

  1. Go没有内置的矩阵。你找到的那个库是第三方库,它只处理float64类型的数据。

然而,如果你只需要以矩阵(n维)格式存储字符串,你可以使用n维切片。对于二维切片,它的声明如下:var myStringMatrix [][]string

英文:
  1. .npz is a numpy specific format. It is unlikely that Go itself would ever support this format in the standard library. I also don't know of any third party library that exists today, and (10 second) search didn't pop one up. If you need npz specifically, go with python + numpy.

If you just want something similar from Go, you can use any format. Binary formats include golang binary and gob. Depending on what you're trying to do, you could even use a non-binary format like json and just compress it on your own.

  1. Go doesn't have built-in matrices. That library you found is third party and it only handles float64s.

However, if you just need to store strings in matrix (n-dimensional) format, you would use a n-dimensional slice. For 2-dimensional it looks like this: var myStringMatrix [][]string.

答案2

得分: 1

npz文件是zip归档文件。归档和压缩(可选)由Python的zip模块处理。npz文件包含了每个保存的变量的一个npy文件。任何基于操作系统的归档工具都可以解压缩和提取组件.npy文件。

所以剩下的问题是 - 你能模拟npy格式吗?这并不是微不足道的,但也不难。它由一个包含形状、步幅、数据类型和顺序信息的头块组成,后面是一个数据块,实际上是数组的数据缓冲区的字节图像。

因此,缓冲区信息和数据与numpy数组内容密切相关。如果变量不是普通数组,save函数会使用Python的pickle机制。

首先,我建议使用csv格式。它不是二进制格式,也不快,但每个人都可以生成和读取它。我们经常收到关于使用np.loadtxtnp.genfromtxt读取此类文件的问题。查看np.savetxt的代码,了解numpy如何生成这种文件。它非常简单。

另一个通用选择是使用数组的tolist格式的JSON。之所以想到这个,是因为GO是谷歌为Web应用程序开发的自家替代Python的语言。JSON是一种基于简化的JavaScript语法的跨语言格式。

英文:

npz files are zip archives. Archiving and compression (optional) are handled by the Python zip module. The npz contains one npy file for each variable that you save. Any OS based archiving tool can decompress and extract the component .npy files.

So the remaining question is - can you simulate the npy format? It isn't trivial, but also not difficult either. It consists of a header block that contains shape, strides, dtype, and order information, followed by a data block, which is, effectively, a byte image of the data buffer of the array.

So the buffer information, and data are closely linked to the numpy array content. And if the variable isn't a normal array, save uses the Python pickle mechanism.

For a start I'd suggest using the csv format. It's not binary, and not fast, but everyone and his brother can generate and read it. We constantly get SO questions about reading such files using np.loadtxt or np.genfromtxt. Look at the code for np.savetxt to see how numpy produces such files. It's pretty simple.

Another general purpose choice would be JSON using the tolist format of an array. That comes to mind because GO is Google's home grown alternative to Python for web applications. JSON is a cross language format based on simplified Javascript syntax.

huangapple
  • 本文由 发表于 2015年9月24日 02:59:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/32747443.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定