子字符串和Go垃圾收集器

huangapple go评论72阅读模式
英文:

substrings and the Go garbage collector

问题

在Go语言中,当对字符串进行子串操作时,不会分配新的内存。相反,子串的底层表示包含一个数据指针,该指针是原始字符串数据指针的偏移量。

这意味着,如果我有一个大字符串并希望跟踪一个小的子串,垃圾回收器将无法释放任何大字符串,直到我释放对较短子串的所有引用。

切片也有类似的问题,但可以通过使用copy()函数来复制子切片来解决。我不知道是否有类似的复制操作适用于字符串。有什么惯用且最快的方法来创建一个子串的“副本”?

英文:

When taking a substring of a string in Go, no new memory is allocated. Instead, the underlying representation of the substring contains a Data pointer that is an offset of the original string's Data pointer.

This means that if I have a large string and wish to keep track of a small substring, the garbage collector will be unable to free any of the large string until I release all references to the shorter substring.

Slices have a similar problem, but you can get around it by making a copy of the subslice using copy(). I am unaware of any similar copy operation for strings. What is the idiomatic and fastest way to make a "copy" of a substring?

答案1

得分: 1

例如,

package main

import (
    "fmt"
    "unsafe"
)

type String struct {
    str *byte
    len int
}

func main() {
    str := "abc"
    substr := string([]byte(str[1:]))
    fmt.Println(str, substr)
    fmt.Println(*(*String)(unsafe.Pointer(&str)), *(*String)(unsafe.Pointer(&substr)))
}

输出:

abc bc
{0x4c0640 3} {0xc21000c940 2}
英文:

For example,

package main

import (
	"fmt"
	"unsafe"
)

type String struct {
	str *byte
	len int
}

func main() {
	str := "abc"
	substr := string([]byte(str[1:]))
	fmt.Println(str, substr)
	fmt.Println(*(*String)(unsafe.Pointer(&str)), *(*String)(unsafe.Pointer(&substr)))
}

Output:

abc bc
{0x4c0640 3} {0xc21000c940 2}

答案2

得分: 0

我知道这是一个旧问题,但是有几种方法可以在不创建两个副本的情况下完成这个操作。

第一种方法是创建子字符串的[]byte,然后使用unsafe.Pointer将其强制转换为string。这是因为[]byte的头部与string的头部相同,只是[]byte在末尾多了一个Cap字段,所以它只是被截断了。

package main

import (
    "fmt"
    "unsafe"
)

func main() {
    str := "foobar"
    byt := []byte(str[3:])
    sub := *(*string)(unsafe.Pointer(&byt))
    fmt.Println(str, sub)
}

第二种方法是使用reflect.StringHeaderreflect.SliceHeader进行更明确的头部转换。

package main

import (
    "fmt"
    "unsafe"
    "reflect"
)

func main() {
    str := "foobar"
    byt := []byte(str[3:])
    bytPtr := (*reflect.SliceHeader)(unsafe.Pointer(&byt)).Data
    strHdr := reflect.StringHeader{Data: bytPtr, Len: len(byt)}
    sub := *(*string)(unsafe.Pointer(&strHdr))
    fmt.Println(str, sub)
}
英文:

I know this is an old question, but there are a couple ways you can do this without creating two copies of the data you want.

First is to create the []byte of the substring, then simply coerce it to a string using unsafe.Pointer. This works because the header for a []byte is the same as that for a string, except that the []byte has an extra Cap field at the end, so it just gets truncated.

package main

import (
    "fmt"
    "unsafe"
)

func main() {
    str := "foobar"
    byt := []byte(str[3:])
    sub := *(*string)(unsafe.Pointer(&byt))
    fmt.Println(str, sub)
}

The second way is to use reflect.StringHeader and reflect.SliceHeader to do a more explicit header transfer.

package main

import (
    "fmt"
    "unsafe"
    "reflect"
)

func main() {
    str := "foobar"
    byt := []byte(str[3:])
    bytPtr := (*reflect.SliceHeader)(unsafe.Pointer(&byt)).Data
    strHdr := reflect.StringHeader{Data: bytPtr, Len: len(byt)}
    sub := *(*string)(unsafe.Pointer(&strHdr))
    fmt.Println(str, sub)
}

huangapple
  • 本文由 发表于 2013年6月4日 12:39:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/16909917.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定