将字节文件读取到int64切片中的最有效方法是什么?

huangapple go评论77阅读模式
英文:

What is the most efficient way to read a file of bytes into an int64 slice?

问题

我有几个打包的int64文件,我需要将它们作为int64切片存储在内存中。问题是,这些文件的总大小超过了机器内存的一半,因此空间有限。在Go语言中,标准的选项可能是这样的:

a := make([]int64, f.Size()/8)
binary.Read(f, binary.LittleEndian, a)

不幸的是,binary会立即分配一个大小为f.Size()*8[]byte,从而导致内存不足。

如果我逐个字节地读取并将其复制到切片中,这种方法确实可行,但速度非常慢。

理想情况下,可以直接将[]byte强制转换为[]int64,只需告诉编译器“好的,这些现在是整数”,但显然这是行不通的。是否有某种类似的方法可以实现这个目标?可能可以使用unsafe包或者必要时转到C语言来实现。

英文:

I have several files of packed int64s. I need them in memory as int64 slices. The problem is that the files are all together over half the size of the memory of the machine, so space is limited. The standard option in Go would be something like:

a := make([]int64, f.Size()/8)
binary.Read(f, binary.LittleEndian, a)

Unfortunately, the binary package will immediately allocate a []byte with size f.Size()*8, and run out of memory.

It does work if I read each byte one at a time and copy it into the slice, but this is prohibitively slow.

The ideal case would be something like casting the []byte directly to []int64, just telling the compiler "ok, these are ints now`, but obviously that doesn't work. Is there some way to accomplish something similar? Possibly using the unsafe package or dropping into C if absolutely needed?

答案1

得分: 2

我有几个打包的int64文件,我需要将它们作为int64切片存储在内存中。问题是这些文件加起来的大小超过了机器内存的一半,所以空间有限。

在Go语言中,标准的选项可能是这样的:

a := make([]int64, f.Size()/8)
binary.Read(f, binary.LittleEndian, a)

不幸的是,binary包会立即分配一个大小为f.Size()*8的[]byte,从而导致内存不足。

以下是使用最小内存的函数:

// 同一字节序架构和数据
// 最高效(无需数据转换)。
func readFileInt64SE(filename string) ([]int64, error) {
    b, err := os.ReadFile(filename)
    if err != nil {
        return nil, err
    }

    const i64Size = int(unsafe.Sizeof(int64(0)))
    i64Ptr := (*int64)(unsafe.Pointer(unsafe.SliceData(b)))
    i64Len := len(b) / i64Size
    i64 := unsafe.Slice(i64Ptr, i64Len)

    return i64, nil
}

例如,对于amd64(LittleEndian)架构和LittleEndian数据的最大效率(无需数据转换),可以使用readFileInt64SE

关于字节顺序的误解 - Rob Pike
https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html

// 任何架构的LittleEndian原地数据转换
func readFileInt64LE(filename string) ([]int64, error) {
    b, err := os.ReadFile(filename)
    if err != nil {
        return nil, err
    }

    const i64Size = int(unsafe.Sizeof(int64(0)))
    i64Ptr := (*int64)(unsafe.Pointer(unsafe.SliceData(b)))
    i64Len := len(b) / i64Size
    i64 := unsafe.Slice(i64Ptr, i64Len)

    for i, j := i64Size, 0; i <= len(b); i, j = i+i64Size, j+1 {
        i64[j] = int64(binary.LittleEndian.Uint64(b[i-i64Size : i]))
    }

    return i64, nil
}
// 任何架构的BigEndian原地数据转换
func readFileInt64BE(filename string) ([]int64, error) {
    b, err := os.ReadFile(filename)
    if err != nil {
        return nil, err
    }

    const i64Size = int(unsafe.Sizeof(int64(0)))
    i64Ptr := (*int64)(unsafe.Pointer(unsafe.SliceData(b)))
    i64Len := len(b) / i64Size
    i64 := unsafe.Slice(i64Ptr, i64Len)

    for i, j := i64Size, 0; i <= len(b); i, j = i+i64Size, j+1 {
        i64[j] = int64(binary.BigEndian.Uint64(b[i-i64Size : i]))
    }

    return i64, nil
}
英文:

> I have several files of packed int64s. I need them in memory as int64 slices. The problem is that the files are all together over half the size of the memory of the machine, so space is limited.
>
> The standard option in Go would be something like:
>
> a := make([]int64, f.Size()/8)
> binary.Read(f, binary.LittleEndian, a)
>
> Unfortunately, the binary package will immediately allocate a []byte with size f.Size()*8, and run out of memory.


All functions use minimal memory.


// Same endian architecture and data
// Most efficient (no data conversion).
func readFileInt64SE(filename string) ([]int64, error) {
	b, err := os.ReadFile(filename)
	if err != nil {
		return nil, err
	}

	const i64Size = int(unsafe.Sizeof(int64(0)))
	i64Ptr := (*int64)(unsafe.Pointer(unsafe.SliceData(b)))
	i64Len := len(b) / i64Size
	i64 := unsafe.Slice(i64Ptr, i64Len)

	return i64, nil
}

For example, for amd64 (LittleEndian) architecture and LittleEndian data maximum efficiency (no data conversion necessary), use readFileInt64SE.


The byte order fallacy - rob pike
https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html


// LittleEndian in-place data conversion for any architecture
func readFileInt64LE(filename string) ([]int64, error) {
	b, err := os.ReadFile(filename)
	if err != nil {
		return nil, err
	}

	const i64Size = int(unsafe.Sizeof(int64(0)))
	i64Ptr := (*int64)(unsafe.Pointer(unsafe.SliceData(b)))
	i64Len := len(b) / i64Size
	i64 := unsafe.Slice(i64Ptr, i64Len)

	for i, j := i64Size, 0; i &lt;= len(b); i, j = i+i64Size, j+1 {
		i64[j] = int64(binary.LittleEndian.Uint64(b[i-i64Size : i]))
	}

	return i64, nil
}

// BigEndian in-place data conversion for any architecture
func readFileInt64BE(filename string) ([]int64, error) {
	b, err := os.ReadFile(filename)
	if err != nil {
		return nil, err
	}

	const i64Size = int(unsafe.Sizeof(int64(0)))
	i64Ptr := (*int64)(unsafe.Pointer(unsafe.SliceData(b)))
	i64Len := len(b) / i64Size
	i64 := unsafe.Slice(i64Ptr, i64Len)

	for i, j := i64Size, 0; i &lt;= len(b); i, j = i+i64Size, j+1 {
		i64[j] = int64(binary.BigEndian.Uint64(b[i-i64Size : i]))
	}

	return i64, nil
}

huangapple
  • 本文由 发表于 2023年6月20日 13:07:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76511549.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定