将字符串转换为rune切片是否会进行复制?

huangapple go评论79阅读模式
英文:

Does the conversion from string to rune slice make a copy?

问题

我正在教自己从C语言背景下学习Go语言。
下面的代码按照我的预期工作(前两个Printf()将访问字节,后两个Printf()将访问码点)。

我不清楚的是这是否涉及任何数据的复制。

package main

import "fmt"

var a string

func main() {
	a = "èe"
	fmt.Printf("%d\n", a[0])
	fmt.Printf("%d\n", a[1])
	fmt.Println("")
	fmt.Printf("%d\n", []rune(a)[0])
	fmt.Printf("%d\n", []rune(a)[1])
}

换句话说:

[]rune("string")是创建一个rune数组并用与"string"对应的runes填充它,还是编译器只是找出如何从字符串字节中获取runes?

英文:

I'm teaching myself Go from a C background.
The code below works as I expect (the first two Printf() will access bytes, the last two Printf() will access codepoints).

What I am not clear is if this involves any copying of data.

package main

import "fmt"

var a string

func main() {
	a = "èe"
	fmt.Printf("%d\n", a[0])
	fmt.Printf("%d\n", a[1])
	fmt.Println("")
	fmt.Printf("%d\n", []rune(a)[0])
	fmt.Printf("%d\n", []rune(a)[1])
}

In other words:

> does []rune("string") create an array of runes and fill it with the runes corresponding to "string", or it's just the compiler that figures out how to get runes from the string bytes?

答案1

得分: 7

将[]uint8(即字符串)转换为[]int32([]rune的别名)而不分配数组是不可能的。

此外,Go中的字符串是不可变的,但切片不是,因此将字符串转换为[]byte和[]rune都必须以某种方式复制字符串的字节。

英文:

It is not possible to turn []uint8 (i.e. a string) into []int32 (an alias for []rune) without allocating an array.

Also, strings are immutable in Go but slices are not, so the conversion to both []byte and []rune must copy the string's bytes in some way or another.

答案2

得分: 6

这涉及到了复制,因为:

  • 字符串是不可变的;如果转换[]rune(s)不进行复制,你就可以索引rune切片并改变字符串内容。
  • string类型的值是一个“(可能为空的)字节序列”,其中byteuint8的别名,而rune是“标识Unicode码点的整数值”的别名,类型不同,甚至长度也可能不同:
    a := "èe"
    r := []rune(a)
    fmt.Println(len(a)) // 3(3个字节)
    fmt.Println(len(r)) // 2(2个Unicode码点)
英文:

It involves a copy because:

  • strings are immutable; if the conversion []rune(s) didn't make a copy, you would be able to index the rune slice and change the string contents
  • a string value is a "(possibly empty) sequence of bytes", where byte is an alias of uint8, whereas a rune is a "an integer value identifying a Unicode code point" and an alias of int32. The types are not identical and even the lengths may not be the same:
    a = "èe"
    r := []rune(a)
    fmt.Println(len(a)) // 3 (3 bytes)
    fmt.Println(len(r)) // 2 (2 Unicode code points)

huangapple
  • 本文由 发表于 2021年7月12日 19:24:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/68346532.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定