为什么Golang的切片操作如此复杂?

huangapple go评论81阅读模式
英文:

Why Golang's slice operation so complicated

问题

这是我在Golang的第一天,当我尝试使用它的切片操作,比如append()时,有一件事让我感到困惑:

package main

import "fmt"

func main() {
    s := []int{2, 3, 5, 7, 11, 13}
    a := s[2:4]
    a = append(a, 1000, 1001)
    a[1] = 100
    printSlice("a:", a)
    printSlice("s:", s)
}

func printSlice(title string, s []int) {
    fmt.Printf("%s  len=%d cap=%d %v\n", title, len(s), cap(s), s)
}

当我只向a追加两个数字时,像这样:

a = append(a, 1000, 1001);

结果是:

a:  len=4 cap=4 [5 100 1000 1001]
s:  len=6 cap=6 [2 3 5 100 1000 1001]

我认为这显示了a是对s的引用。

但是,当我将代码改为:

a = append(a, 1000, 1001, 1002);

结果变成了:

a:  len=5 cap=8 [5 100 1000 1001 1002]
s:  len=6 cap=6 [2 3 5 7 11 13]

我认为a已经被重新分配到另一个内存段中,以容纳整个切片,并且与a的引用断开了。

这种行为不一致让我感到困惑,有时候很容易犯这个错误(比如当你有一个随机数量的值要追加)。

为什么Golang设计成这样?如果我只想要类似JavaScript中的slicepush操作,有什么方法可以避免这种情况?

英文:

It's my first day in Golang, and when I try its slice operation, like append(), there is one thing that makes me so confused:

package main

import "fmt"

func main() {
	s := []int{2, 3, 5, 7, 11, 13}
	a:= s[2:4];
	a = append(a, 1000, 1001);
	a[1] = 100;	
	printSlice("a:", a)
	printSlice("s:", s)
}

func printSlice(title string, s []int) {
	fmt.Printf("%s  len=%d cap=%d %v\n", title,  len(s), cap(s), s)
}

When I append only two numbers to a, like:

a = append(a, 1000, 1001);

...the result is:

a:  len=4 cap=4 [5 100 1000 1001]
s:  len=6 cap=6 [2 3 5 100 1000 1001]

Which, I think, shows a as a reference to s.

But when I change that line of code to:

a = append(a, 1000, 1001, 1002);

...the result becomes:

a:  len=5 cap=8 [5 100 1000 1001 1002]
s:  len=6 cap=6 [2 3 5 7 11 13]

Which, I think, a has been reassigned to another segment of memory, to hold the whole thing, and detach the reference to s.

This is so inconsistent, and makes me so confused, that sometimes it is really easy to make this error (like when you have a random number of values to append).

Why is Golang designed like this? How can this be avoided, if I just want an operation like in JavaScript's slice and push?

答案1

得分: 7

这是与Go语言中切片实现相关的一个陷阱。

slice的结构体如下所示:

type slice struct {
    array unsafe.Pointer
    len   int
    cap   int
}

所以,一个切片有长度和容量。如果你尝试向切片追加项目,使其超过当前容量,那么会在底层创建一个新的数组来保存新的数据,但是由于之前的子切片可能仍然指向旧的数组,所以旧的数组会保持不变,直到没有更多的引用指向它为止。


现在假设我们有一个切片 A: [1, 2, 3, 4, 5, 6],以及一个子切片 B,它指向 A 中的最后3个项目:[4, 5, 6]。

[1, 2, 3, 4, 5, 6]
 ^        ^
 |        |
 |        B------
 |
 A---------------  

现在,如果我们向 B 追加一个项目,根据你的预期行为,它应该更新 A,因此会创建一个新的数组。如果子切片的大小与实际数组相比较小(例如,为了追加1个项目,需要从原始数组复制1000个项目),这可能是低效的。

此外,为了保持一致,指向旧数组的所有其他引用(子切片)都必须更新为指向新数组中的适当位置,这意味着我们需要在切片中存储额外的信息,比如起始索引。如果我们有一个子切片的子切片,这可能会变得棘手。

因此,当前的实现是有道理的。


在这种情况下,推荐的方法是对子切片进行复制,而不是直接对其进行操作,以防止出现这样的问题。另一个拥有副本的优点是,如果原始切片很大且没有引用,那么它可以被垃圾回收,但是如果存在子切片,那么原始数组将一直保留在内存中,直到子切片仍然引用它。

英文:

This is a gotcha related to how slices are implemented in Go.

slice's struct looks like:

type slice struct {
    array unsafe.Pointer
    len   int
    cap   int
}

So, a slice has a length and a capacity. If you try to append items to the slice such that it exceeds the current capacity then a new array is created underneath to hold the new data, but as the previous subslices may still be pointing to the older array it is kept as is until there are no more references left to it.


Now let's say we have a slice A: [1, 2, 3, 4, 5, 6] and a subslice B that points to last 3 items in A: [4, 5, 6].

[1, 2, 3, 4, 5, 6]
 ^        ^
 |        |
 |        B------
 |
 A---------------  

Now if we append an item to B then from your expected behaviour it should update A as well hence a new array will be created due to that. This can be inefficient if the size of subslice is small compared to the actual array(for appending 1 item copying 1000 items from original array).

Plus to keep it consistent all other references(subslices) that point to the old array will have to be updated to point to appropriate positions in the new array, that means we will have to store additional information in our slices, like start index. And this can get tricky if we have a subslice of a subslice.

Hence the current implementation makes sense.


The recommended approach here is to make a copy of subslice instead of working on it directly to prevent such issues. Another advantage of having a copy is that if the original slice is huge and has no references anymore then it can be garbage collected, but in case if it there was a subslice then the original array will be kept it memory till the subslice is still referencing to it.

huangapple
  • 本文由 发表于 2016年11月23日 06:28:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/40752795.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定