使用unsafe在golang中从二进制数据中提取字符串的最佳方法

huangapple go评论80阅读模式
英文:

Best way to extract strings from binary data in golang using unsafe

问题

我有一个应用程序,它加载了一个几个GB大小的字节数组。我无法控制二进制格式。程序大部分时间都在将数组的部分转换为字符串,进行字符串操作,然后释放所有字符串。当有大量客户端触发大量对象在内存中分配时,它偶尔会耗尽内存。

考虑到字节数组在整个应用程序的生命周期内都存在于内存中,使用unsafe包来避免内存分配似乎是一个理想的选择。

在Go Playground中进行测试时,似乎需要一个"SliceHeader"来生成实际的字符串。但这意味着每次需要返回字符串时仍然必须分配一个"SliceHeader"(例如此示例中的"x"变量)。

我可以在每个客户端连接到服务器时附加一个具有固定长度的字符串头对象数组(在新客户端连接时进行循环利用)。

这意味着:1. 字符串数据不再被复制,2. 字符串头不再被分配/垃圾回收,3. 我们知道每个服务器的最大客户端数量,因为它们在提取字符串时有固定/硬编码的字符串头数量可用。

我是正确的吗?还是有点疯狂?请告诉我,谢谢。

英文:

I have an application which loads a byte array of several gigabytes. I dont have control of the binary format. The program spends most of its time converting sections of the array into strings, doing string manipulation and then releasing all of the strings. It occasionally runs out of memory when there are large numbers of clients triggering large numbers of objects being allocated in memory.

Given that the byte array lives in memory for the entire life of he Application it seems like an ideal candidate for using the unsafe package to avoid memory allocation.

Just testing this out in the go playground, it appears a "SliceHeader" is needed to generate an actual string. But this means a "SliceHeader" must still be allocated every time a string needs to be returned. (i.e. the "x" variable in this example)

func main() {
	t := []byte{
		65, 66, 67, 68, 69, 70,
		71, 72, 73, 74, 75, 76,
		77, 78, 79, 80, 81, 82,
		83, 84, 85,
	}
	var x [10]reflect.StringHeader

	h := (*reflect.StringHeader)(unsafe.Pointer(&x[0]))
	h.Len = 4
	h.Data = uintptr(unsafe.Pointer(&t[8]))

	fmt.Printf("test %v\n", *(*string)(unsafe.Pointer(&x[0])))

	h = (*reflect.StringHeader)(unsafe.Pointer(&x[1]))
	h.Len = 4
	h.Data = uintptr(unsafe.Pointer(&t[3]))

	fmt.Printf("test %v\n", *(*string)(unsafe.Pointer(&x[1])))
}

I could probably attach an array with a fixed length set of string header objects to each client when they connect to the server (that is re-cycled when new clients connect).

This means that 1. string data would no longer be copied around, and 2. string headers are not being allocated/garbage collected. 3. We know the maximum number of clients per server because they have a fixed/hardcoded amount of stringheaders available when they are pulling out strings.

Am I on track, crazy? Let me know 😀 Thanks.

答案1

得分: 2

使用以下函数将字节切片转换为字符串而不进行分配:

func btos(p []byte) string {
    return *(*string)(unsafe.Pointer(&p))
}

该函数利用了字符串头的内存布局是切片头的前缀的事实。

在调用此函数后,请不要修改切片的后备数组,否则将破坏字符串不可变的假设。

可以像这样使用该函数:

t := []byte{
    65, 66, 67, 68, 69, 70,
    71, 72, 73, 74, 75, 76,
    77, 78, 79, 80, 81, 82,
    83, 84, 85,
}
s := btos(t[8:12])
fmt.Printf("test %v\n", s) // 输出 test IJKL

s = btos(t[3:7])
fmt.Printf("test %v\n", s) // 输出 test DEFG

注意:以上代码示例是使用Go语言编写的。

英文:

Use the following function to convert a byte slice to a string without allocation:

func btos(p []byte) string {
	return *(*string)(unsafe.Pointer(&p))
}

The function takes advantage of the fact that the memory layout for a string header is a prefix of the memory layout for a slice header.

Do not modify the backing array of the slice after calling this function -- that will break the assumption that strings are immutable.

Use the function like this:

t := []byte{
	65, 66, 67, 68, 69, 70,
	71, 72, 73, 74, 75, 76,
	77, 78, 79, 80, 81, 82,
	83, 84, 85,
}
s := btos(t[8:12])
fmt.Printf("test %v\n", s) // prints test IJKL

s = btos(t[3:7])
fmt.Printf("test %v\n", s) // prints test DEFG

huangapple
  • 本文由 发表于 2022年1月18日 15:00:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/70751437.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定