2023年6月20日 13:07:29go评论83阅读模式

英文:

What is the most efficient way to read a file of bytes into an int64 slice?

问题

我有几个打包的int64文件，我需要将它们作为int64切片存储在内存中。问题是，这些文件的总大小超过了机器内存的一半，因此空间有限。在Go语言中，标准的选项可能是这样的：

a := make([]int64, f.Size()/8)
binary.Read(f, binary.LittleEndian, a)

不幸的是，binary包会立即分配一个大小为f.Size()*8的[]byte，从而导致内存不足。

如果我逐个字节地读取并将其复制到切片中，这种方法确实可行，但速度非常慢。

理想情况下，可以直接将[]byte强制转换为[]int64，只需告诉编译器“好的，这些现在是整数”，但显然这是行不通的。是否有某种类似的方法可以实现这个目标？可能可以使用unsafe包或者必要时转到C语言来实现。

英文:

I have several files of packed int64s. I need them in memory as int64 slices. The problem is that the files are all together over half the size of the memory of the machine, so space is limited. The standard option in Go would be something like:

a := make([]int64, f.Size()/8)
binary.Read(f, binary.LittleEndian, a)

Unfortunately, the binary package will immediately allocate a []byte with size f.Size()*8, and run out of memory.

It does work if I read each byte one at a time and copy it into the slice, but this is prohibitively slow.

The ideal case would be something like casting the []byte directly to []int64, just telling the compiler "ok, these are ints now`, but obviously that doesn't work. Is there some way to accomplish something similar? Possibly using the unsafe package or dropping into C if absolutely needed?

答案1

得分: 2

我有几个打包的int64文件，我需要将它们作为int64切片存储在内存中。问题是这些文件加起来的大小超过了机器内存的一半，所以空间有限。

在Go语言中，标准的选项可能是这样的：

a := make([]int64, f.Size()/8)
binary.Read(f, binary.LittleEndian, a)

不幸的是，binary包会立即分配一个大小为f.Size()*8的[]byte，从而导致内存不足。

以下是使用最小内存的函数：

// 同一字节序架构和数据
// 最高效（无需数据转换）。
func readFileInt64SE(filename string) ([]int64, error) {
    b, err := os.ReadFile(filename)
    if err != nil {
        return nil, err
    }

    const i64Size = int(unsafe.Sizeof(int64(0)))
    i64Ptr := (*int64)(unsafe.Pointer(unsafe.SliceData(b)))
    i64Len := len(b) / i64Size
    i64 := unsafe.Slice(i64Ptr, i64Len)

    return i64, nil
}

例如，对于amd64（LittleEndian）架构和LittleEndian数据的最大效率（无需数据转换），可以使用readFileInt64SE。

关于字节顺序的误解 - Rob Pike
https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html

// 任何架构的LittleEndian原地数据转换
func readFileInt64LE(filename string) ([]int64, error) {
    b, err := os.ReadFile(filename)
    if err != nil {
        return nil, err
    }

    const i64Size = int(unsafe.Sizeof(int64(0)))
    i64Ptr := (*int64)(unsafe.Pointer(unsafe.SliceData(b)))
    i64Len := len(b) / i64Size
    i64 := unsafe.Slice(i64Ptr, i64Len)

    for i, j := i64Size, 0; i <= len(b); i, j = i+i64Size, j+1 {
        i64[j] = int64(binary.LittleEndian.Uint64(b[i-i64Size : i]))
    }

    return i64, nil
}

// 任何架构的BigEndian原地数据转换
func readFileInt64BE(filename string) ([]int64, error) {
    b, err := os.ReadFile(filename)
    if err != nil {
        return nil, err
    }

    const i64Size = int(unsafe.Sizeof(int64(0)))
    i64Ptr := (*int64)(unsafe.Pointer(unsafe.SliceData(b)))
    i64Len := len(b) / i64Size
    i64 := unsafe.Slice(i64Ptr, i64Len)

    for i, j := i64Size, 0; i <= len(b); i, j = i+i64Size, j+1 {
        i64[j] = int64(binary.BigEndian.Uint64(b[i-i64Size : i]))
    }

    return i64, nil
}

英文:

> I have several files of packed int64s. I need them in memory as int64 slices. The problem is that the files are all together over half the size of the memory of the machine, so space is limited.
>
> The standard option in Go would be something like:
>
> a := make([]int64, f.Size()/8)
> binary.Read(f, binary.LittleEndian, a)
>
> Unfortunately, the binary package will immediately allocate a []byte with size f.Size()*8, and run out of memory.

All functions use minimal memory.

// Same endian architecture and data
// Most efficient (no data conversion).
func readFileInt64SE(filename string) ([]int64, error) {
	b, err := os.ReadFile(filename)
	if err != nil {
		return nil, err
	}

	const i64Size = int(unsafe.Sizeof(int64(0)))
	i64Ptr := (*int64)(unsafe.Pointer(unsafe.SliceData(b)))
	i64Len := len(b) / i64Size
	i64 := unsafe.Slice(i64Ptr, i64Len)

	return i64, nil
}

For example, for amd64 (LittleEndian) architecture and LittleEndian data maximum efficiency (no data conversion necessary), use readFileInt64SE.

The byte order fallacy - rob pike
https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html

// LittleEndian in-place data conversion for any architecture
func readFileInt64LE(filename string) ([]int64, error) {
	b, err := os.ReadFile(filename)
	if err != nil {
		return nil, err
	}

	const i64Size = int(unsafe.Sizeof(int64(0)))
	i64Ptr := (*int64)(unsafe.Pointer(unsafe.SliceData(b)))
	i64Len := len(b) / i64Size
	i64 := unsafe.Slice(i64Ptr, i64Len)

	for i, j := i64Size, 0; i &lt;= len(b); i, j = i+i64Size, j+1 {
		i64[j] = int64(binary.LittleEndian.Uint64(b[i-i64Size : i]))
	}

	return i64, nil
}

// BigEndian in-place data conversion for any architecture
func readFileInt64BE(filename string) ([]int64, error) {
	b, err := os.ReadFile(filename)
	if err != nil {
		return nil, err
	}

	const i64Size = int(unsafe.Sizeof(int64(0)))
	i64Ptr := (*int64)(unsafe.Pointer(unsafe.SliceData(b)))
	i64Len := len(b) / i64Size
	i64 := unsafe.Slice(i64Ptr, i64Len)

	for i, j := i64Size, 0; i &lt;= len(b); i, j = i+i64Size, j+1 {
		i64[j] = int64(binary.BigEndian.Uint64(b[i-i64Size : i]))
	}

	return i64, nil
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将字节文件读取到int64切片中的最有效方法是什么？

问题

答案1

使用Golang Colly进行网络爬虫，如何处理找不到XML路径的情况？

如何在Golang中将多个数据对象传递给HTML模板？

Should I create pointers on struct field or on struct? Go

如何让我的GCloud函数打开一个新的SSH连接来访问SFTP服务器？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论