使用`*[]Item`作为参数类型是正确的,因为切片(Slice)默认是指针类型。

huangapple go评论80阅读模式
英文:

Is it correct to use slice as *[]Item, because Slice is by default pointer

问题

在Go语言中,使用切片的正确方式是什么?根据Go的文档,切片默认是指针类型,所以创建切片时使用*[]Item是正确的方式吗?由于切片默认是指针类型,这种创建切片的方式是否将其变成了指向指针的指针?

我认为创建切片的正确方式是[]Item或者[]*item(切片保存了指向item的指针)。

英文:

What is the right way to use slice in Go. As per Go documentation slice is by default pointer, so is creating slice as *[]Item is the right way?. Since slice are by default pointer isn't this way of creating the slice making it pointer to a pointer.

I feel the right way to create slice is []Item or []*item (slice holding pointers of items)

答案1

得分: 8

一点理论

你的问题没有意义:没有所谓的“正确”或“错误”,也没有“正确”和“不正确”:你可以有一个指向切片的指针,也可以有一个指向指向切片的指针,你可以无限地添加这样的间接引用级别。

要做什么取决于特定情况下的需求。

为了帮助你进行推理,我将尝试提供一些事实并得出一些结论。

关于Go中的类型和值,有两个要理解的事情:

  • 在Go中,一切都是通过值传递的。

    这意味着变量赋值(=:=)、将值传递给函数和方法调用以及在内部发生的内存复制(例如,重新分配切片的后备数组或重新平衡映射时)都是通过值传递的。

    通过值传递意味着被分配的值的实际位被物理复制到“接收”该值的变量中。

  • Go中的类型(包括内置类型和用户定义类型,包括标准库中定义的类型)在赋值时可以具有值语义和引用语义。

    这有点棘手,经常会导致新手错误地认为上面解释的第一个规则不成立。

    “诀窍”在于,如果一个类型包含一个指针(变量的地址)或由单个指针组成,当复制该类型的值时,该指针的值也会被复制。

    这意味着什么?
    很简单:如果你将类型为int的变量的值赋给另一个类型为int的变量,两个变量包含相同的位,但它们是完全独立的:更改其中任何一个的内容,另一个不会受到影响。
    如果你将包含指针(或由单个指针组成)的变量赋给另一个变量,它们两个也将包含相同的位,并且在这方面是独立的,即在任何一个变量中更改这些位不会影响另一个变量。
    但是由于这两个变量中的指针都包含相同内存位置的地址,使用这些指针来修改它们指向的内存位置的内容将修改相同的内存。
    换句话说,区别在于int不会“引用”任何内容,而指针自然地“引用”另一个内存位置,因为它包含其地址。
    因此,如果一个类型至少包含一个指针(它可以通过包含另一个类型的字段来实现,该字段本身包含一个指针,依此类推,直到任意嵌套级别),这个类型的值将具有引用赋值语义:如果你将一个值赋给另一个变量,你最终会得到两个值引用相同的内存位置。

    这就是为什么映射、切片和字符串具有引用语义的原因:当你赋值给这些类型的变量时,两个变量都指向相同的底层内存位置。

让我们继续讨论切片。

切片 vs 切片的指针

逻辑上,切片是一个包含三个字段的struct:指向切片的后备数组(实际上包含切片元素的数组)的指针,以及两个int:切片的容量和长度。
当你传递和赋值切片值时,这些struct值会被复制:一个指针和两个整数。
正如你所看到的,当你传递一个切片值时,后备数组 不会 被复制,只有一个指向它的指针被复制。

现在让我们考虑何时使用普通切片或切片的指针。

如果你关心性能(内存分配和/或复制内存所需的CPU周期),这些担忧是没有根据的:在今天的硬件上,传递一个切片时复制三个整数是非常廉价的。
使用切片的指针可能会使复制速度稍快一些——只需一个整数而不是三个——但是这些节省很容易被以下两个事实抵消:

  • 切片的值几乎肯定会最终分配在堆上,以便编译器可以确保其值在函数调用的边界上生存下来——因此你将为使用内存管理器付出代价,并且垃圾收集器将有更多的工作。
  • 使用间接级别会降低数据局部性:访问RAM是很慢的,所以CPU有缓存,它会预取正在读取的地址后面的数据。如果控制流立即读取另一个位置的内存,预取的数据将被丢弃:缓存崩溃。

好的,那么有没有情况下你会想要一个切片的指针?
是的。例如,内置的append函数可以定义为

func append(*[]T, T...)

而不是

func append([]T, T...) []T

(注意,这里的T实际上表示“任何类型”,因为append不是一个库函数,不能在纯Go中合理地定义;所以它有点像伪代码。)

也就是说,它可以接受一个指向切片的指针,并可能 替换 指针所指向的切片,所以你可以这样调用它:append(&slice, element),而不是slice = append(slice, element)

但是,老实说,在我处理过的真实Go项目中,我只记得使用指向切片的指针的唯一情况是关于重复使用的切片池——以节省内存重新分配。而且这种情况仅仅是因为[sync.Pool][sync-pool]保留了类型为interface{}的元素,当使用指针时可能更有效。

值的切片 vs 值的指针的切片

上面描述的逻辑完全适用于对这种情况的推理。

当你将一个值放入切片时,该值会被复制。当切片需要增长其后备数组时,数组将被重新分配,重新分配意味着将所有现有元素物理复制到新的内存位置。

因此,有两个考虑因素:

  • 元素是否足够小,以便复制它们不会对内存和CPU资源造成压力?

    (注意,“小”和“大”也严重依赖于工作程序中此类复制的频率:偶尔复制几兆字节不是什么大问题;在紧密的时间关键循环中复制甚至几十千字节可能是一个大问题。)

  • 你的程序是否可以接受多个相同数据的副本?(例如,某些类型的值,如[sync.Mutex][sync-mutex],在第一次使用后不能被复制。²)

如果对任何一个问题的答案是“否”,你应该考虑在切片中保留指针。但是当你考虑保留指针时,也要考虑上面解释的数据局部性:如果一个切片包含用于时间关键的数值计算的数据,最好不要让CPU追踪指针。

总结一下:当你询问关于做某事的“正确”或“正确”方式时,如果没有指定根据哪些标准我们可以对问题的所有可能解决方案进行分类,那么这个问题是没有意义的。然而,在设计存储和操作数据的方式时,必须进行一些考虑,我已经尝试解释了这些考虑。

总的来说,关于切片的一个经验法则可能是:

  • 切片被设计为“原样”传递——作为值,而不是指向包含其值的变量的指针。

    当然,也有合理的理由使用指向切片的指针。

  • 大多数情况下,你将值保存在切片的元素中,而不是指向包含这些值的变量的指针。
    以下是这个一般规则的例外情况:

    • 你打算存储在切片中的值占用的空间太大,以至于看起来使用这些值的切片的预期模式会涉及过多的内存压力。
    • 你打算存储在切片中的值的类型要求它们不能被复制,而只能被引用,每个类型的值只存在一个实例。一个很好的例子是包含/嵌入[sync.Mutex][sync-mutex]类型字段的类型(或者实际上是sync包中除了那些具有引用语义的类型(如[sync.Pool][sync-pool])之外的任何其他类型的变量)²。

关于正确性与性能的注意事项

上面的文本包含了很多性能考虑。
我提出它们是因为Go是一种相对低级的语言:不像C、C++和Rust那样低级,但仍然为程序员提供了在性能问题上使用的许多调整空间。
然而,你应该非常清楚,在你的学习曲线上,正确性 必须 是你的首要目标——如果不是唯一的目标的话:请不要生气,但如果你想要调优一些Go代码以节省一些CPU执行时间,你一开始就不会提出这个问题。
换句话说,请将上述所有内容视为一组事实和考虑因素,以指导你在学习和探索这个主题时,但不要陷入先考虑性能的陷阱。使你的程序正确、易于阅读和修改。


¹ 接口值是一对指针:一个指向包含你放入接口值的值的变量,一个指向Go运行时内部的描述该变量类型的特殊数据结构。
因此,虽然你可以直接将切片值放入类型为interface{}的变量中——从语言上来说是完全可以的——但是如果该值的类型本身不是单个指针,编译器将不得不在堆上分配一个变量来包含你的值的副本,并将指向该新变量的指针存储在类型为interface{}的值中。
这是为了保持Go赋值的“一切都是通过值传递”的语义。
因此,如果你将切片值放入类型为interface{}的变量中,你最终会得到堆上的该值的副本。
正因为如此,在诸如[sync.Map][sync-mutex]之类的数据结构中保留切片的指针会使代码变得更加丑陋,但会减少内存波动。

² 所有同步原语,在编译为机器代码时,都在内存位置上工作——也就是说,所有需要在同一个原语上进行同步的程序的部分实际上都使用了表示该原语的内存块的相同地址。因此,如果你锁定一个互斥锁,将其值复制到一个新变量(也就是说——一个不同的内存位置),然后解锁副本,最初锁定的副本不会注意到,使用它进行同步的程序的所有其他部分也不会注意到,这意味着你的代码中有一个严重的错误。

英文:

A bit of theory

Your question has no sense: there's no "right" or "wrong" or "correct" and "incorrect": you can have a pointer to a slice, and you can have a pointer to a pointer to a slice, and you can add levels of such indirection endlessly.

What to do depends on what you need in a particular case.

To help you with the reasoning, I'll try to provide a couple of facts and draw some conclusions.

The first two things to understand about types and values in Go are:

  • Everything in Go, ever, always, is passed by value.

    This means variable assignments (= and :=), passing values to function and method calls, and copying memory which happens internally such as when reallocating backing arrays of slices or rebalancing maps.

    Passing by value means that actual bits of the value which is assigned are physically copied into the variable which "receives" the value.

  • Types in Go—both built-in and user-defined (including those defined in the standard library)—can have value semantics and reference semantics when it comes to assignment.

    This one is a bit tricky, and often leads to novices incorrectly assuming that the first rule explained above does not hold.

    "The trick" is that if a type contains a pointer (an adderss of a variable) or consists of a single pointer, the value of this pointer is copied when the value of the type is copied.

    What does this mean?
    Pretty simple: if you assign the value of a variable of type int to another variable of type int, both variables contain identical bits but they are completely independent: change the content of any of them, and another will be unaffected.
    If you assign a variable containing a pointer (or consisting of a single pointer) to another one, they both, again, will contain identical bits and are independent in the sense that changing those bits in any of them will not affect the other.
    But since the pointer in both these variables contains the address of the same memory location, using those pointers to modify the contents of the memory location they point at will modify the same memory.
    In other words, the difference is that an int does not reference anything while a pointer naturally references another memory location—because it contains its address.
    Hence, if a type contains at least a single pointer (it may do so by containing a field of another type which itself contains a pointer, and so on—to any nesting level), values of this type will have reference assignment semantics: if you assign a value to another variable, you end up with two values referencing the same memory location.

    That is why maps, slices and strings have reference semantics: when you assign variables of these types both variables point to the same underlying memory location.

Let's move on to slices.

Slices vs pointers to slices

A slice, logically, is a struct of three fields: a pointer to the slice's backing array which actually contains the slice's elements, and two ints: the capacity of the slice and its length.
When you pass around and assign a slice value, these struct values are copied: a pointer and two integers.
As you can see, when you pass a slice value around the backing array is not copied—only a pointer to it.

Now let's consider when you want to use a plain slice or a pointer to a slice.

If you're concerned with performance (memory allocation and/or CPU cycles needed to copy memory), these concerns are unfounded: copying three integers when passing around a slice is dirt-cheap on today's hardware.
Using a pointer to a slice would make copying a tiny bit faster—a single integer rather than three—but these savings will be easily offset by two facts:

  • The slice's value will almost certainly end up being allocated on the heap so that the compiler can be sure its value will survive crossing boundaries of the function calls—so you will pay for using the memory manager, and the garbage collector will have more work.
  • Using a level of indirection reduces data locality: accessing RAM is slow so CPUs have caches which prefetch data at the addresses following the one at which the data is being read. If the control flow immediately reads memory at another location, the prefetched data is thrown away: cache trashing.

OK, so is there a case when you would want a pointer to a slice?
Yes. For instance, the built-in append function could have been defined as

func append(*[]T, T...)

instead of

func append([]T, T...) []T

(N.B. the T here actually means "any type" because append is not a library fuction and cannot be sensibly defined in plain Go; so it's sort of pseudocode.)

That is, it could accept a pointer to a slice and possibly replace the slice pointed to by the pointer, so you'd call it as append(&slice, element) and not as slice = append(slice, element).

But honestly, in real-world Go projects I have dealt with, the only case of using pointers to slices which I can remember was about pooling slices which are heavily reused—to save on memory reallocations. And that sole case was only due to [sync.Pool][sync-pool] keeping elements of type interface{} which may be more effective when using pointers¹.

Slices of values vs slices of pointers to values

Exactly the same logic described above applies to the reasoning about this case.

When you put a value in a slice that value is copied. When the slice needs to grow its backing array, the array will be reallocated, and reallocation means physically copying all existing elements into the new memory location.

So, two considerations:

  • Are elements reasonably small so that copying them is not going to press on memory and CPU resources?

    (Note that "small" vs "big" also heavily depens on the frequency of such copying in a working program: copying a couple of megabytes once in a while is not a big deal; copying even tens of kilobytes in a tight time-critical loop can be a big deal.)

  • Is your program OK with multiple copies of the same data? (For instance, values of certain types like [sync.Mutex][sync-mutex] must not be copied after first use.²)

If the answer to either question is "no", you should consider keeping pointers in the slice. But when you consider keeping pointers, also think about data locality explained above: if a slice contains data intended for time-critical number-crunching, it's better not have the CPU to chase pointers.

To recap: when you ask about a "correct" or "right" way of doing something, the question has no sense without specifying the set of criteria according to which we could classify all possible solutions to a problem. Still, there are considerations which must be performed when designing the way you're going to store and manipulate data, and I have tried to explain these considerations.

In general, a rule of thumb regarding slices could be:

  • Slices are designed to be passed around "as is"—as values, not pointers to variables containing their values.

    There are legitimate reasons to have pointers to slices, though.

  • Most of the time you keep values in the slice's elements, not pointers to variables with these values.
    Exceptions to this general rule:

    • Values you intend to store in a slice occupy too much space so that it looks like the envisioned pattern of using slices of them would involve excessive memory pressure.
    • Types of values you intend to store in a slice require they must not be copied but rather only referenced, existing as a single instance each. A good example are types containing/embedding a field of type [sync.Mutex][sync-mutex] (or, actually, a variable of any other type from the sync package except those which itself have reference semantics such as [sync.Pool][sync-pool])².

A note of caution on correctness vs performance

The text above contains a lot of performance considerations.
I've presented them because Go is a reasonably low-level language: not that low-level as C and C++ and Rust but still providing the programmer with plenty of wiggle-room to use when performance is at stake.
Still, you should very well understand that at this point on your learning curve, correctness must be your top—if not the sole—objective: please take no offence, but if you were after tuning some Go code to shave off some CPU time to execute it, you weren't asking your question in the first place.
In other words, please consider all of the above as a set of facts and considerations to guilde you in your learning and exploration of the subject but do not fall into the trap of trying to think about performance first. Make your programs correct and easy to read and modify.


¹ An interface value is a pair of pointers: to the variable containing the value you have put into the interface value and to a special data structure inside the Go runtime which describes the type of that variable.
So while you can put a slice value into a variable of type interface{} directly—in the sense that it's perfectly fine in the language—if the value's type is not itself a single pointer, the compiler will have to allocate on the heap a variable to contain a copy of your value there, and store a pointer to that new variable into the value of type interface{}.
This is needed to hold that "everything is always passed by value" semantics of the Go assignments.
Consequently, if you put a slice value into a variable of type interface{}, you will end up with a copy of that value on the heap.
Because of this, keeping pointers to slices in data structures such as [sync.Map][sync-mutex] makes code uglier but results in lesser memory churn.

² All synchronization primitives, when compiled down to the machine code, work on memory locations–that is, all parts of the running program which need to synchronize on the same primitive are actually using the same address of a memory block representing that primitive. Hence, consider that if you lock a mutex, copy its value to a new variable (that means–a distinct memory location) and then unlock the copy, the initially locked copy won't notice, and all other parts of the program which use it for synchronization won't notice, too, which means you have a grave bug in your code.

[sync-mutex]: https://pkg.go.dev/sync#Mutex "sync.Mutex docs"
[sync-pool]: https://pkg.go.dev/sync#Pool "sync.Pool docs"

huangapple
  • 本文由 发表于 2022年7月14日 17:58:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/72978660.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定