Golang的append()函数在什么情况下会创建一个新的切片?

huangapple go评论83阅读模式
英文:

When does Golang append() create a new slice?

问题

根据内置API文档,当原始切片的容量不足时,append()函数会重新分配并复制到一个新的数组块中。

这是一个(简化版本的)递归算法,用于创建字母表(在本例中是布尔值)的组合。字母表的成员(true、false)被递归地添加到切片中,直到达到正确的长度,然后通过通道发送。

package main

import (
	"fmt"
)

func AddOption(c chan []bool, combo []bool, length int) {
	if length == 0 {
		fmt.Println(combo, "!")
		c <- combo
		return
	}
	var newCombo []bool
	for _, ch := range []bool{true, false} {
		newCombo = append(combo, ch)
		AddOption(c, newCombo, length-1)
	}
}

func main() {
	c := make(chan []bool)
	go func(c chan []bool) {
		defer close(c)
		AddOption(c, []bool{}, 4)
	}(c)
	for combination := range c {
		fmt.Println(combination)
	}
}

这里是这段代码的Playground链接。在输出中:

[true true true true] !
[true true true false] !
[true true true false]
[true true true false]
[true true false true] !
[true true false false] !
[true true false false]
[true true false false]
[true false true true] !
[true false true false] !
[true false true false]
[true false true false]
[true false false true] !
[true false false false] !
[true false false false]
[true false false false]
[false true true true] !
[false true true false] !
[false true true false]
[false true true false]
[false true false true] !
[false true false false] !
[false true false false]
[false true false false]
[false false true true] !
[false false true false] !
[false false true false]
[false false true false]
[false false false true] !
[false false false false] !
[false false false false]
[false false false false]

以感叹号结尾的行是从AddOption发送到通道的内容,没有感叹号的行是从通道的另一侧(即main()函数)出现的内容。很明显,发送到通道的切片在发送后发生了变化。

由于AddOption在发送切片后立即返回,修改必须来自以下代码块:

var newCombo []bool
for _, ch := range []bool{true, false} {
	newCombo = append(combo, ch)
	AddOption(c, newCombo, length-1)
}

但是,根据文档,append()应该返回一个新的切片(cap(combo)不够大)。根据这个答案,发送给AddOption作为第二个参数的切片描述符应该是一个副本;这不是真的吗?据我所知,要么作为第二个参数发送的值是一个切片描述符的指针,要么append()没有返回一个新的切片。

英文:

According to the builtin api docs, append() will reallocate and copy to a new array block when the capacity of the original slice is not large enough.

Here is a (simplified version of) a recursive algorithm for creating combinations of an alphabet (in this case booleans). Members of the alphabet (true, false) are recursively added to a slice until it is the correct length, at which point it is sent over the channel.

package main

import (
	&quot;fmt&quot;
)

func AddOption(c chan []bool, combo []bool, length int) {
	if length == 0 {
		fmt.Println(combo, &quot;!&quot;)
		c &lt;- combo
		return
	}
	var newCombo []bool
	for _, ch := range []bool{true, false} {
		newCombo = append(combo, ch)
		AddOption(c, newCombo, length-1)
	}
}

func main() {
	c := make(chan []bool)
	go func(c chan []bool) {
		defer close(c)
		AddOption(c, []bool{}, 4)
	}(c)
	for combination := range c {
		fmt.Println(combination)
	}
}

Here is the playground link to this code. In the output:

[true true true true] !
[true true true false] !
[true true true false]
[true true true false]
[true true false true] !
[true true false false] !
[true true false false]
[true true false false]
[true false true true] !
[true false true false] !
[true false true false]
[true false true false]
[true false false true] !
[true false false false] !
[true false false false]
[true false false false]
[false true true true] !
[false true true false] !
[false true true false]
[false true true false]
[false true false true] !
[false true false false] !
[false true false false]
[false true false false]
[false false true true] !
[false false true false] !
[false false true false]
[false false true false]
[false false false true] !
[false false false false] !
[false false false false]
[false false false false]

Lines ending in an exclamation mark are those sent into the channel from AddOption. Those without are what emerges on the other side (i.e. in main()). It is clear that the slices send over the channel are changed after they are sent.

Since AddOption returns immediately after sending the slice, the modification has to come from the code block

var newCombo []bool
for _, ch := range []bool{true, false} {
	newCombo = append(combo, ch)
	AddOption(c, newCombo, length-1)
}

But, according to the docs, append() should return a new slice (cap(combo) is not large enough). According to this answer, the slice descriptor sent to AddOption should be a copy; is that not true? As far as I can tell, either the value sent as the second argument to AddOption() is either a pointer to a slice descriptor, or append() is not returning a new slice.

1: http://golang.org/pkg/builtin/#append "builtin append&#40;&#41; docs"
2: https://play.golang.org/p/cDnFkaWCMh "Playground"
3: https://stackoverflow.com/a/20197469/1736306

答案1

得分: 11

你正在混淆切片(slice)这个数据类型和实际的表示方式。切片描述符由两个整数(一个用于长度,一个用于容量)和指向底层数据的指针组成。

所以,append返回的确实是一个新的切片,而传递给add option的确实是切片描述符的副本。但是由于描述符中包含了指向数据的指针,指针的值(底层数据的地址)是相同的。

编辑:这里有一个代码片段来说明我的观点:

package main

import "fmt"

func main() {
    s := make([]int, 0, 5)
    s = append(s, []int{1, 2, 3, 4}...)

    a := append(s, 5)
    fmt.Println(a)

    b := append(s, 6)
    fmt.Println(b)
    fmt.Println(a)
}

如果你运行这段代码,会得到以下输出:

[1 2 3 4 5]
[1 2 3 4 6]
[1 2 3 4 6]

因为s仍然有容量,所以ab共享相同的数据指针。如果你将容量改为4,输出如下:

[1 2 3 4 5]
[1 2 3 4 6]
[1 2 3 4 5]
英文:

You are confusing slice, the data type, with the actual representation. The slice descriptor is composed of a pair of ints, one for len and one for cap, and a pointer to the underlying data.

So, what append returns is indeed a new slice and what is passed to add option is indeed a copy of the slice descriptor. But since the descriptor has a pointer to data, the pointer value (the address to the underlying data) is the same.

EDIT: Here is a code snippet to illustrate my point:

package main

import &quot;fmt&quot;

func main() {
    s := make([]int, 0, 5)
    s = append(s, []int{1, 2, 3, 4}...)

    a := append(s, 5)
    fmt.Println(a)

    b := append(s, 6)
    fmt.Println(b)
    fmt.Println(a)
}

If you run this, you get:

[1 2 3 4 5]
[1 2 3 4 6]
[1 2 3 4 6]

Because since s still has capacity, both a and b share the same data ptr. If you change the capacity to 4, it prints:

[1 2 3 4 5]
[1 2 3 4 6]
[1 2 3 4 5]

答案2

得分: 6

append()创建一个新的切片时,它并不会创建一个比之前切片仅大一个元素的切片。实际上,它创建的切片比之前的切片多几个元素。看一下这段代码:

package main

import "fmt"

func main() {
    var sl []bool

    for i := 0; i < 100; i++ {
        sl = append(sl, true)
        fmt.Println(cap(sl))
    }
}

如果你运行这段代码,你会发现初始时切片的容量每次都会翻倍;当然,对于更大的切片大小,这种策略会有所改变。

英文:

When append() creates a new slice, it doesn't create a slice that's just one larger than the slice before. It actually creates a slice that is already a couple of elements larger than the previous one. Have a look at this code:

package main

import &quot;fmt&quot;

func main() {
    var sl []bool

    for i := 0; i &lt; 100; i++ {
        sl = append(sl, true)
        fmt.Println(cap(sl))
    }
}

<kbd>Playground</kbd>

If you run this code, you see that the capacity initially doubles on every allocation; this strategy is of course changed for larger slice sizes.

答案3

得分: 1

根据链接中的内容:

> Go语言在这方面采取了更加精简和懒惰的方法。它会不断修改同一个底层数组,直到切片的容量达到上限。

这与其他语言中切片的行为非常不同:

> 大多数语言(如Python)在切片指向的底层数组进行写操作时会创建另一个副本。

链接中提到的示例的输出解释了这种行为。

切片 a 长度为 7,容量为 7 [0 0 0 0 0 0 0]
切片 b 指向切片 a 的索引 2、3、4。因此,容量为 5(= 7-2)。
b := a[2:5]
切片 b 长度为 3,容量为 5 [0 0 0]
 
修改切片 b 也会修改切片 a,因为它们指向同一个底层数组。
b[0] = 9
切片 a 长度为 7,容量为 7 [0 0 9 0 0 0 0]
切片 b 长度为 3,容量为 5 [9 0 0]
 
向切片 b 添加 1,覆盖了切片 a。
切片 a 长度为 7,容量为 7 [0 0 9 0 0 1 0]
切片 b 长度为 4,容量为 5 [9 0 0 1]
 
向切片 b 添加 2,覆盖了切片 a。
切片 a 长度为 7,容量为 7 [0 0 9 0 0 1 2]
切片 b 长度为 5,容量为 5 [9 0 0 1 2]
 
向切片 b 添加 3。由于容量超载,这里会创建一个新的副本。
切片 a 长度为 7,容量为 7 [0 0 9 0 0 1 2]
切片 b 长度为 6,容量为 12 [9 0 0 1 2 3]
 
在上一步容量超载后,验证切片 a 和 b 是否指向不同的底层数组。
b[1] = 8
切片 a 长度为 7,容量为 7 [0 0 9 0 0 1 2]
切片 b 长度为 6,容量为 12 [9 8 0 1 2 3]

> 在最后的验证步骤中,当对 b 进行修改时,a 指向的底层数组不再被修改,这感觉有点神奇。逻辑上的期望是:当 b 达到限制时,a 和 b 都指向同一个新分配的底层数组,而不是 a 继续指向旧的底层数组。

当多个切片指向同一个底层数组,并频繁进行 append 操作时,可能会变得棘手。在上面的链接中可以了解更多信息。

英文:

Ref: http://criticalindirection.com/2016/02/17/slice-with-a-pinch-of-salt/

According to the link:

> Go takes a more lean and lazy approach in doing this. It keeps
> modifying the same underlying array until the capacity of a slice is
> reached.

This is quite different from the behavior of slices in other languages:

> Most languages, like Python, create another copy of the underlying
> array when any of the slices pointing to it does a write.

The output of example mentioned, explains the behavior.

Slice a len=7 cap=7 [0 0 0 0 0 0 0]
Slice b refers to the 2, 3, 4 indices in slice a. Hence, the capacity is 5 (= 7-2).
b := a[2:5]
Slice b len=3 cap=5 [0 0 0]
 
Modifying slice b, also modifies a, since they are pointing to the same underlying array.
b[0] = 9
Slice a len=7 cap=7 [0 0 9 0 0 0 0]
Slice b len=3 cap=5 [9 0 0]
 
Appending 1 to slice b. Overwrites a.
Slice a len=7 cap=7 [0 0 9 0 0 1 0]
Slice b len=4 cap=5 [9 0 0 1]
 
Appending 2 to slice b. Overwrites a.
Slice a len=7 cap=7 [0 0 9 0 0 1 2]
Slice b len=5 cap=5 [9 0 0 1 2]
 
Appending 3 to slice b. Here, a new copy is made as the capacity is overloaded.
Slice a len=7 cap=7 [0 0 9 0 0 1 2]
Slice b len=6 cap=12 [9 0 0 1 2 3]
 
Verifying slices a and b point to different underlying arrays after the capacity-overload in the previous step.
b[1] = 8
Slice a len=7 cap=7 [0 0 9 0 0 1 2]
Slice b len=6 cap=12 [9 8 0 1 2 3]

> Here, in the last verification-step, it feels a bit spooky that any
> modification to b is no more causing modification to the underlying
> array pointed to by a. A logical expectation would be that: when b
> hits the limit, a and b both point to the same newly allocated
> underlying array, instead of a continuing to point to the older one.

Having multiple slices pointing to same underlying array, with frequent append operations can get tricky. More on this in the link above.

huangapple
  • 本文由 发表于 2015年1月24日 01:39:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/28115599.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定