指针与参数和返回值中的值的比较

huangapple go评论115阅读模式
英文:

Pointers vs. values in parameters and return values

问题

在Go语言中,有多种方法可以返回struct值或其切片。对于单个值,我见过以下几种方式:

type MyStruct struct {
    Val int
}

func myfunc() MyStruct {
    return MyStruct{Val: 1}
}

func myfunc() *MyStruct {
    return &MyStruct{}
}

func myfunc(s *MyStruct) {
    s.Val = 1
}

我理解这些之间的区别。第一种方式返回结构体的副本,第二种方式返回在函数内部创建的结构体值的指针,第三种方式期望传入一个已存在的结构体,并覆盖其值。

我见过这些模式在不同的上下文中被使用,我想知道在使用它们时有哪些最佳实践。你会在什么情况下使用哪种方式?例如,第一种方式对于小型结构体可能是可以接受的(因为开销很小),第二种方式对于较大的结构体可能更合适。而第三种方式则可以在调用之间轻松地重用单个结构体实例,从而实现极高的内存效率。在何时使用哪种方式有没有任何最佳实践?

类似地,对于切片也有同样的问题:

func myfunc() []MyStruct {
    return []MyStruct{ MyStruct{Val: 1} }
}

func myfunc() []*MyStruct {
    return []*MyStruct{ &MyStruct{Val: 1} }
}

func myfunc(s *[]MyStruct) {
    *s = []MyStruct{ MyStruct{Val: 1} }
}

func myfunc(s *[]*MyStruct) {
    *s = []*MyStruct{ &MyStruct{Val: 1} }
}

同样的问题:在这里有哪些最佳实践?我知道切片总是指针,所以返回切片的指针是没有用的。然而,我应该返回结构体值的切片,还是返回指向结构体的指针的切片?我应该将切片的指针作为参数传入(这是Go App Engine API中使用的一种模式)吗?

英文:

In Go there are various ways to return a struct value or slice thereof. For individual ones I've seen:

type MyStruct struct {
    Val int
}

func myfunc() MyStruct {
    return MyStruct{Val: 1}
}

func myfunc() *MyStruct {
    return &MyStruct{}
}

func myfunc(s *MyStruct) {
    s.Val = 1
}

I understand the differences between these. The first returns a copy of the struct, the second a pointer to the struct value created within the function, the third expects an existing struct to be passed in and overrides the value.

I've seen all of these patterns be used in various contexts, I'm wondering what the best practices are regarding these. When would you use which? For instance, the first one could be ok for small structs (because the overhead is minimal), the second for bigger ones. And the third if you want to be extremely memory efficient, because you can easily reuse a single struct instance between calls. Are there any best practices for when to use which?

Similarly, the same question regarding slices:

func myfunc() []MyStruct {
    return []MyStruct{ MyStruct{Val: 1} }
}

func myfunc() []*MyStruct {
    return []MyStruct{ &MyStruct{Val: 1} }
}

func myfunc(s *[]MyStruct) {
    *s = []MyStruct{ MyStruct{Val: 1} }
}

func myfunc(s *[]*MyStruct) {
    *s = []MyStruct{ &MyStruct{Val: 1} }
}

Again: what are best practices here. I know slices are always pointers, so returning a pointer to a slice isn't useful. However, should I return a slice of struct values, a slice of pointers to structs, should I pass in a pointer to a slice as argument (a pattern used in the Go App Engine API)?

答案1

得分: 583

tl;dr

  • 使用接收器指针的方法很常见;接收器的经验法则是,“如果不确定,请使用指针。”
  • 切片、映射、通道、字符串、函数值和接口值在内部使用指针实现,指向它们的指针通常是多余的。
  • 在其他情况下,对于大型结构体或需要更改的结构体,使用指针,否则通过指针意外更改事物会令人困惑。

一些情况下不需要使用指针:

  • 代码审查指南建议将像 type Point struct { latitude, longitude float64 } 这样的小型结构体作为值传递,除非你调用的函数需要能够直接修改它们。

    • 值语义避免了意外更改值的别名情况。
    • 通过值传递小型结构体可以通过避免缓存未命中或堆分配来提高效率。无论如何,当指针和值的性能表现相似时,Go 的做法是选择提供更自然语义的方式,而不是追求每一点速度。
    • 因此,Go Wiki 的代码审查评论页面建议在结构体较小且可能保持不变时使用值传递。
    • 如果“大型”截断点似乎模糊,那是因为确实如此;可以说许多结构体处于一个范围,其中指针或值都可以。作为下限,代码审查评论建议切片(三个机器字)作为值接收器是合理的。作为上限,bytes.Replace 需要 10 个字的参数(三个切片和一个 int)。你可以找到一些情况,即使是复制大型结构体也会带来性能优势,但经验法则是不要这样做。
  • 对于切片,你不需要传递指针来更改数组的元素。例如,io.Reader.Read(p []byte) 会更改 p 的字节。这可以说是“将小型结构体视为值”的特例,因为在内部你传递的是一个名为 slice header 的小型结构(参见 Russ Cox (rsc) 的解释)。类似地,你不需要指针来修改映射或在通道上进行通信

  • 对于你将要重新切片的切片(更改起始位置/长度/容量),内置函数如 append 接受一个切片值并返回一个新的切片。我会模仿这种方式;它避免了别名问题,返回一个新的切片有助于注意到可能会分配新数组的事实,并且对调用者来说是熟悉的。

    • 并不总是实践这种模式。一些工具(如数据库接口序列化器)需要追加到在编译时未知的切片类型的切片。它们有时会在 interface{} 参数中接受一个切片的指针。
  • 映射、通道、字符串、函数和接口值与切片类似,它们在内部已经是引用或包含引用的结构,因此如果你只是想避免复制底层数据,你不需要传递指针给它们。(rsc 在另一篇文章中介绍了接口值的存储方式)。

    • 在较少情况下,你仍然可能需要传递指针,因为你想要修改调用者的结构:例如,flag.StringVar 之所以接受 *string,就是为了这个原因。

你应该使用指针的情况:

  • 考虑你的函数是否应该成为你需要指针的结构体的方法。人们期望在 x 上有很多方法来修改 x,因此将修改后的结构体作为接收器可能有助于最小化意外情况。有关何时应该使用指针作为接收器的指南

  • 对于对其非接收器参数产生影响的函数,应在 godoc 中明确说明,或者更好的做法是在 godoc 和名称中都明确说明(例如 reader.WriteTo(writer))。

  • 你提到接受指针以避免分配以允许重用的情况;我会推迟更改 API 以实现内存重用的优化,直到明确分配的成本非常高,并且然后我会寻找一种不会强制所有用户使用更复杂的 API 的方法:

    1. 为了避免分配,Go 的逃逸分析是你的朋友。通过使用可以使用简单构造函数、普通字面量或有用的零值(如 bytes.Buffer)初始化的类型,有时可以帮助它避免堆分配。
    2. 考虑添加一个 Reset() 方法将对象恢复到空白状态,就像某些标准库类型提供的那样。不关心或无法节省分配的用户不必调用它。
    3. 考虑将就地修改方法和从头创建函数编写为匹配对,以提供便利性:existingUser.LoadFromJSON(json []byte) error 可以由 NewUserFromJSON(json []byte) (*User, error) 包装。同样,它将选择懒惰和节省分配的选择推给了个别调用者。
    4. 寻求回收内存的调用者可以让 sync.Pool 处理一些细节。如果特定的分配产生了很大的内存压力,你确信知道何时不再使用该分配,并且没有更好的优化可用,sync.Pool 可以提供帮助。(CloudFlare 发布了一篇关于回收内存的有用(在 sync.Pool 之前)的博文。)

最后,关于你的切片是否应该是指针的问题:值切片可能很有用,并且可以节省分配和缓存未命中。但也可能会遇到以下障碍:

  • 创建项目的 API 可能会强制你使用指针,例如,你必须调用 NewFoo() *Foo 而不是让 Go 使用零值进行初始化。
  • 项目的期望生命周期 可能不都相同。整个切片会一次性释放;如果有 99% 的项目不再有用,但你有指向其他 1% 的指针,整个数组仍然会被分配。
  • 复制或移动值 可能会导致性能或正确性问题,从而使指针更具吸引力。特别是,append增长底层数组时会复制项目。append 之前的切片项的指针可能不再指向复制后的位置,对于巨大的结构体来说,复制可能会更慢,对于例如 sync.Mutex,不允许复制。在中间插入/删除和排序也会移动项目,因此类似的考虑也适用。

总的来说,如果你一开始就将所有项目放在适当的位置,并且不移动它们(例如,在初始设置后没有更多的 append),或者如果你确信移动它们是可以接受的(没有/谨慎使用项目的指针,并且项目很小或者你已经测量了性能影响),那么值切片可能是有意义的。有时候,这取决于你的具体情况,但这是一个大致的指南。

英文:

tl;dr:

  • Methods using receiver pointers are common; the rule of thumb for receivers is, "If in doubt, use a pointer."
  • Slices, maps, channels, strings, function values, and interface values are implemented with pointers internally, and a pointer to them is often redundant.
  • Elsewhere, use pointers for big structs or structs you'll have to change, and otherwise pass values, because getting things changed by surprise via a pointer is confusing.

<!-- This is a pain to rearrange because each bullet refers to the previous, but the order is wrong; I think it should be 1) lots of things already are/contain pointers (this seems like the most common misuse of pointers), 2) passing back a new pointer-containing value for reslicing etc., 3) passing values when structs are and will remain small enough to not be awkward as args. -->

One case where you should often use a pointer:

  • Receivers are pointers more often than other arguments. It's not unusual for methods to modify the thing they're called on, or for named types to be large structs, so the guidance is to default to pointers except in rare cases.<br>
    • Jeff Hodges' copyfighter tool automatically searches for non-tiny receivers passed by value.<br>

Some situations where you don't need pointers:

  • Code review guidelines suggest passing small structs like type Point struct { latitude, longitude float64 }, and maybe even things a bit bigger, as values, unless the function you're calling needs to be able to modify them in place.

    • Value semantics avoid aliasing situations where an assignment over here changes a value over there by surprise.
    • Passing small structs by value can be more efficient by avoiding cache misses or heap allocations. In any case, when pointers and values perform similarly, the Go-y approach is to choose whatever provides the more natural semantics rather than squeeze out every last bit of speed.
    • So, Go Wiki's code review comments page suggests passing by value when structs are small and likely to stay that way.
    • If the "large" cutoff seems vague, it is; arguably many structs are in a range where either a pointer or a value is OK. As a lower bound, the code review comments suggest slices (three machine words) are reasonable to use as value receivers. As something nearer an upper bound, bytes.Replace takes 10 words' worth of args (three slices and an int). You can find situations where copying even large structs turns out a performance win, but the rule of thumb is not to.<br>
  • For slices, you don't need to pass a pointer to change elements of the array. io.Reader.Read(p []byte) changes the bytes of p, for instance. It's arguably a special case of "treat little structs like values," since internally you're passing around a little structure called a slice header (see Russ Cox (rsc)'s explanation). Similarly, you don't need a pointer to modify a map or communicate on a channel.

  • For slices you'll reslice (change the start/length/capacity of), built-in functions like append accept a slice value and return a new one. I'd imitate that; it avoids aliasing, returning a new slice helps call attention to the fact that a new array might be allocated, and it's familiar to callers.

    • It's not always practical follow that pattern. Some tools like database interfaces or serializers need to append to a slice whose type isn't known at compile time. They sometimes accept a pointer to a slice in an interface{} parameter.
  • Maps, channels, strings, and function and interface values, like slices, are internally references or structures that contain references already, so if you're just trying to avoid getting the underlying data copied, you don't need to pass pointers to them. (rsc wrote a separate post on how interface values are stored).

    • You still may need to pass pointers in the rarer case that you want to modify the caller's struct: flag.StringVar takes a *string for that reason, for example.

Where you use pointers:

  • Consider whether your function should be a method on whichever struct you need a pointer to. People expect a lot of methods on x to modify x, so making the modified struct the receiver may help to minimize surprise. There are guidelines on when receivers should be pointers.

  • Functions that have effects on their non-receiver params should make that clear in the godoc, or better yet, the godoc and the name (like reader.WriteTo(writer)).

  • You mention accepting a pointer to avoid allocations by allowing reuse; changing APIs for the sake of memory reuse is an optimization I'd delay until it's clear the allocations have a nontrivial cost, and then I'd look for a way that doesn't force the trickier API on all users:

    1. For avoiding allocations, Go's escape analysis is your friend. You can sometimes help it avoid heap allocations by making types that can be initialized with a trivial constructor, a plain literal, or a useful zero value like bytes.Buffer.
    2. Consider a Reset() method to put an object back in a blank state, like some stdlib types offer. Users who don't care or can't save an allocation don't have to call it.
    3. Consider writing modify-in-place methods and create-from-scratch functions as matching pairs, for convenience: existingUser.LoadFromJSON(json []byte) error could be wrapped by NewUserFromJSON(json []byte) (*User, error). Again, it pushes the choice between laziness and pinching allocations to the individual caller.
    4. Callers seeking to recycle memory can let sync.Pool handle some details. If a particular allocation creates a lot of memory pressure, you're confident you know when the alloc is no longer used, and you don't have a better optimization available, sync.Pool can help. (CloudFlare published a useful (pre-sync.Pool) blog post about recycling.)

Finally, on whether your slices should be of pointers: slices of values can be useful, and save you allocations and cache misses. There can be blockers:

  • The API to create your items might force pointers on you, e.g. you have to call NewFoo() *Foo rather than let Go initialize with the zero value.
  • The desired lifetimes of the items might not all be the same. The whole slice is freed at once; if 99% of the items are no longer useful but you have pointers to the other 1%, all of the array remains allocated.
  • Copying or moving the values might cause you performance or correctness problems, making pointers more attractive. Notably, append copies items when it grows the underlying array. Pointers to slice items from before the append may not point to where the item was copied after, copying can be slower for huge structs, and for e.g. sync.Mutex copying isn't allowed. Insert/delete in the middle and sorting also move items around so similar considerations can apply.

Broadly, value slices can make sense if either you get all of your items in place up front and don't move them (e.g., no more appends after initial setup), or if you do keep moving them around but you're confident that's OK (no/careful use of pointers to items, and items are small or you've measured the perf impact). Sometimes it comes down to something more specific to your situation, but that's a rough guide.

答案2

得分: 29

如果可以的话(例如,一个不需要作为引用传递的非共享资源),请使用值。原因如下:

  1. 你的代码会更加美观和可读,避免使用指针操作符和空指针检查。
  2. 你的代码会更安全,避免空指针恐慌。
  3. 你的代码通常会更快:是的,更快!为什么呢?

原因1:你在堆上分配的项会更少。从栈上分配/释放是立即的,但是从堆上分配/释放可能非常昂贵(分配时间+垃圾回收)。你可以在这里看到一些基本的数字:http://www.macias.info/entry/201802102230_go_values_vs_references.md

原因2:特别是如果你将返回的值存储在切片中,你的内存对象在内存中会更加紧凑:循环遍历一个所有项都是连续的切片比遍历一个所有项都是指向内存其他部分的指针的切片要快得多。不是因为间接步骤,而是因为增加了缓存未命中。

神话破解:典型的x86缓存行大小为64字节。大多数结构体都比这个小。在内存中复制缓存行的时间与复制指针的时间相似。

只有当你的代码的关键部分很慢时,我才会尝试一些微小的优化,并检查使用指针是否在一定程度上提高了速度,但代价是降低了可读性和可维护性。

英文:

If you can (e.g. a non-shared resource that does not need to be passed as reference), use a value. By the following reasons:

  1. Your code will be nicer and more readable, avoiding pointer operators and null checks.
  2. Your code will be safer against Null Pointer panics.
  3. Your code will be often faster: yes, faster! Why?

Reason 1: you will allocate less items in the heap. Allocating/deallocating from stack is immediate, but allocating/deallocating on Heap may be very expensive (allocation time + garbage collection). You can see some basic numbers here: http://www.macias.info/entry/201802102230_go_values_vs_references.md

Reason 2: especially if you store returned values in slices, your memory objects will be more compacted in memory: looping a slice where all the items are contiguous is much faster than iterating a slice where all the items are pointers to other parts of the memory. Not for the indirection step but for the increase of cache misses.

Myth breaker: a typical x86 cache line are 64 bytes. Most structs are smaller than that. The time of copying a cache line in memory is similar to copying a pointer.

Only if a critical part of your code is slow I would try some micro-optimization and check if using pointers improves somewhat the speed, at the cost of less readability and mantainability.

答案3

得分: 22

当你想要使用方法接收器作为指针时,有三个主要原因:

  1. 首先,也是最重要的,方法是否需要修改接收器?如果需要修改,接收器必须是指针。

  2. 其次是效率的考虑。如果接收器很大,比如一个大的结构体,使用指针接收器会更加高效。

  3. 接下来是一致性。如果类型的某些方法必须使用指针接收器,那么其他方法也应该使用指针接收器,这样方法集在类型的使用方式上是一致的。

参考:https://golang.org/doc/faq#methods_on_values_or_pointers

另一个重要的事情是要知道你发送给函数的实际类型。类型可以是值类型或引用类型。

即使切片和映射作为引用类型,但在某些情况下,比如在函数中改变切片的长度,我们可能希望将它们作为指针传递。

英文:

Three main reasons when you would want to use method receivers as pointers:

  1. "First, and most important, does the method need to modify the receiver? If it does, the receiver must be a pointer."

  2. "Second is the consideration of efficiency. If the receiver is large, a big struct for instance, it will be much cheaper to use a pointer receiver."

  3. "Next is consistency. If some of the methods of the type must have pointer receivers, the rest should too, so the method set is consistent regardless of how the type is used"

Reference : https://golang.org/doc/faq#methods_on_values_or_pointers

Edit : Another important thing is to know the actual "type" that you are sending to function. The type can either be a 'value type' or 'reference type'.

Even as slices and maps acts as references, we might want to pass them as pointers in scenarios like changing the length of the slice in the function.

答案4

得分: 9

通常情况下需要返回指针的一个案例是在构造某个有状态或可共享资源的实例时。这通常由以New为前缀的函数完成。

由于它们表示某个特定的实例,并且可能需要协调某些活动,因此生成表示相同资源的重复/复制结构并没有太多意义,因此返回的指针充当资源本身的句柄。

以下是一些例子:

  • func NewTLSServer(handler http.Handler) *Server - 为测试实例化一个Web服务器
  • func Open(name string) (*File, error) - 返回文件访问句柄

在其他情况下,返回指针只是因为结构体可能太大,无法默认进行复制:

  • func NewRGBA(r Rectangle) *RGBA - 在内存中分配图像

另外,可以通过返回包含指针的结构体的副本来避免直接返回指针,但这可能不被认为是惯用的做法。

  • 标准库中没有找到这样的例子...
  • 相关问题:https://stackoverflow.com/q/28501976
英文:

A case where you generally need to return a pointer is when constructing an instance of some stateful or shareable resource. This is often done by functions prefixed with New.

Because they represent a specific instance of something and they may need to coordinate some activity, it doesn't make a lot of sense to generate duplicated/copied structures representing the same resource -- so the returned pointer acts as the handle to the resource itself.

Some examples:

In other cases, pointers are returned just because the structure may be too large to copy by default:


Alternatively, returning pointers directly could be avoided by instead returning a copy of a structure that contains the pointer internally, but maybe this isn't considered idiomatic:

答案5

得分: 4

关于结构体与指针返回值的问题,在阅读了许多在GitHub上备受关注的开源项目后,我感到困惑,因为这些项目中有很多关于这两种情况的示例,直到我找到了这篇很棒的文章:https://www.ardanlabs.com/blog/2014/12/using-pointers-in-go.html

"一般来说,除非结构体类型已经被实现为像原始数据值一样的行为,否则应该使用指针来共享结构体类型的值。

如果你还不确定,还有另一种思考方式。将每个结构体都看作有一种本质。如果结构体的本质是不应该改变的,比如时间、颜色或坐标,那么将结构体实现为原始数据值。如果结构体的本质是可以改变的,即使在你的程序中从未改变过,它也不是原始数据值,应该使用指针来共享。不要创建具有两重本质的结构体。"

完全被说服了。

英文:

Regarding to struct vs. pointer return value, I got confused after reading many highly stared open source projects on github, as there are many examples for both cases, util I found this amazing article:
https://www.ardanlabs.com/blog/2014/12/using-pointers-in-go.html

"In general, share struct type values with a pointer unless the struct type has been implemented to behave like a primitive data value.

If you are still not sure, this is another way to think about. Think of every struct as having a nature. If the nature of the struct is something that should not be changed, like a time, a color or a coordinate, then implement the struct as a primitive data value. If the nature of the struct is something that can be changed, even if it never is in your program, it is not a primitive data value and should be implemented to be shared with a pointer. Don’t create structs that have a duality of nature."

Completedly convinced.

huangapple
  • 本文由 发表于 2014年5月8日 21:21:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/23542989.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定