英文:
substrings and the Go garbage collector
问题
在Go语言中,当对字符串进行子串操作时,不会分配新的内存。相反,子串的底层表示包含一个数据指针,该指针是原始字符串数据指针的偏移量。
这意味着,如果我有一个大字符串并希望跟踪一个小的子串,垃圾回收器将无法释放任何大字符串,直到我释放对较短子串的所有引用。
切片也有类似的问题,但可以通过使用copy()函数来复制子切片来解决。我不知道是否有类似的复制操作适用于字符串。有什么惯用且最快的方法来创建一个子串的“副本”?
英文:
When taking a substring of a string in Go, no new memory is allocated. Instead, the underlying representation of the substring contains a Data pointer that is an offset of the original string's Data pointer.
This means that if I have a large string and wish to keep track of a small substring, the garbage collector will be unable to free any of the large string until I release all references to the shorter substring.
Slices have a similar problem, but you can get around it by making a copy of the subslice using copy(). I am unaware of any similar copy operation for strings. What is the idiomatic and fastest way to make a "copy" of a substring?
答案1
得分: 1
例如,
package main
import (
"fmt"
"unsafe"
)
type String struct {
str *byte
len int
}
func main() {
str := "abc"
substr := string([]byte(str[1:]))
fmt.Println(str, substr)
fmt.Println(*(*String)(unsafe.Pointer(&str)), *(*String)(unsafe.Pointer(&substr)))
}
输出:
abc bc
{0x4c0640 3} {0xc21000c940 2}
英文:
For example,
package main
import (
"fmt"
"unsafe"
)
type String struct {
str *byte
len int
}
func main() {
str := "abc"
substr := string([]byte(str[1:]))
fmt.Println(str, substr)
fmt.Println(*(*String)(unsafe.Pointer(&str)), *(*String)(unsafe.Pointer(&substr)))
}
Output:
abc bc
{0x4c0640 3} {0xc21000c940 2}
答案2
得分: 0
我知道这是一个旧问题,但是有几种方法可以在不创建两个副本的情况下完成这个操作。
第一种方法是创建子字符串的[]byte
,然后使用unsafe.Pointer
将其强制转换为string
。这是因为[]byte
的头部与string
的头部相同,只是[]byte
在末尾多了一个Cap
字段,所以它只是被截断了。
package main
import (
"fmt"
"unsafe"
)
func main() {
str := "foobar"
byt := []byte(str[3:])
sub := *(*string)(unsafe.Pointer(&byt))
fmt.Println(str, sub)
}
第二种方法是使用reflect.StringHeader
和reflect.SliceHeader
进行更明确的头部转换。
package main
import (
"fmt"
"unsafe"
"reflect"
)
func main() {
str := "foobar"
byt := []byte(str[3:])
bytPtr := (*reflect.SliceHeader)(unsafe.Pointer(&byt)).Data
strHdr := reflect.StringHeader{Data: bytPtr, Len: len(byt)}
sub := *(*string)(unsafe.Pointer(&strHdr))
fmt.Println(str, sub)
}
英文:
I know this is an old question, but there are a couple ways you can do this without creating two copies of the data you want.
First is to create the []byte
of the substring, then simply coerce it to a string
using unsafe.Pointer
. This works because the header for a []byte
is the same as that for a string
, except that the []byte
has an extra Cap
field at the end, so it just gets truncated.
package main
import (
"fmt"
"unsafe"
)
func main() {
str := "foobar"
byt := []byte(str[3:])
sub := *(*string)(unsafe.Pointer(&byt))
fmt.Println(str, sub)
}
The second way is to use reflect.StringHeader
and reflect.SliceHeader
to do a more explicit header transfer.
package main
import (
"fmt"
"unsafe"
"reflect"
)
func main() {
str := "foobar"
byt := []byte(str[3:])
bytPtr := (*reflect.SliceHeader)(unsafe.Pointer(&byt)).Data
strHdr := reflect.StringHeader{Data: bytPtr, Len: len(byt)}
sub := *(*string)(unsafe.Pointer(&strHdr))
fmt.Println(str, sub)
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论