英文:
Performance of function slice parameter vs global variable?
问题
我有以下函数:
func checkFiles(path string, excludedPatterns []string) {
// ...
}
我想知道,由于excludedPatterns
从不改变,我是否应该通过将变量设为全局变量(而不是每次传递给函数)来进行优化,或者Golang是否通过按写时复制(copy-on-write)来处理这个问题?
编辑:我猜我可以将切片作为指针传递,但我仍然想知道关于按写时复制行为(如果存在的话)以及一般情况下是否应该担心按值传递还是按指针传递。
英文:
I've got the following function:
func checkFiles(path string, excludedPatterns []string) {
// ...
}
I'm wondering, since excludedPatterns
never changes, should I optimize it by making the var global (and not passing it to the function every time), or does Golang already handle this by passing them as copy-on-write?
Edit: I guess I could pass the slice as a pointer, but I'm still wondering about the copy-on-write behavior (if it exists) and whether, in general, I should worry about passing by value or by pointer.
答案1
得分: 21
从你的函数名来看,性能似乎并不那么重要,甚至不考虑将参数移动到全局变量中以节省传递参数所需的时间/空间(与调用函数和传递值相比,IO操作(如检查文件)要慢得多)。
在Go中,切片只是小型描述符,类似于具有指向支持数组和2个int
的指针的结构体,即长度和容量。无论支持数组有多大,传递切片始终是高效的,你甚至不应该考虑传递指向它们的指针,除非你想修改切片头。
在Go中,参数始终按值传递,并且会复制传递的值。如果传递一个指针,那么指针值将被复制并传递。当传递一个切片时,切片值(一个小型描述符)将被复制并传递 - 它将指向相同的支持数组(不会被复制)。
此外,如果你需要在函数中多次访问切片,将其作为参数通常是额外的收益,因为编译器可以进行进一步的优化/缓存,而如果它是一个全局变量,则需要更多的注意。
关于切片及其内部的更多信息:Go切片:用法和内部原理
如果你想要关于性能的确切数字,请进行基准测试!
下面是一个小型基准测试代码,展示了传递切片作为参数和访问全局切片这两种解决方案之间没有区别。将其保存到一个名为slices_test.go
的文件中,并使用go test -bench .
运行它。
package main
import (
"testing"
)
var gslice = make([]string, 1000)
func global(s string) {
for i := 0; i < 100; i++ { // 循环多次访问切片
_ = s
_ = gslice // 访问全局切片
}
}
func param(s string, ss []string) {
for i := 0; i < 100; i++ { // 循环多次访问切片
_ = s
_ = ss // 访问参数切片
}
}
func BenchmarkParameter(b *testing.B) {
for i := 0; i < b.N; i++ {
param("hi", gslice)
}
}
func BenchmarkGlobal(b *testing.B) {
for i := 0; i < b.N; i++ {
global("hi")
}
}
示例输出:
testing: warning: no tests to run
PASS
BenchmarkParameter-2 30000000 55.4 ns/op
BenchmarkGlobal-2 30000000 55.1 ns/op
ok _/V_/workspace/IczaGo/src/play 3.569s
英文:
Judging from the name of your function, performance can't be that critical to even consider moving parameters to global variables just to save time/space required to pass them as parameters (IO operations like checking files are much-much slower than calling functions and passing values to them).
Slices in Go are just small descriptors, something like a struct with a pointer to a backing array and 2 int
s, a length and capacity. No matter how big the backing array is, passing slices are always efficient and you shouldn't even consider passing a pointer to them, unless you want to modify the slice header of course.
Parameters in Go are always passed by value, and a copy of the value being passed is made. If you pass a pointer, then the pointer value will be copied and passed. When a slice is passed, the slice value (which is a small descriptor) will be copied and passed - which will point to the same backing array (which will not be copied).
Also if you need to access the slice multiple times in the function, a parameter is usually an extra gain as compilers can make further optimization / caching, while if it is a global variable, more care has to be taken.
More about slices and their internals: Go Slices: usage and internals
And if you want exact numbers on performance, benchmark!
Here comes a little benchmarking code which shows no difference between the 2 solutions (passing slice as argument or accessing a global slice). Save it into a file like slices_test.go
and run it with go test -bench .
package main
import (
"testing"
)
var gslice = make([]string, 1000)
func global(s string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = gslice // Access global-slice
}
}
func param(s string, ss []string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func BenchmarkParameter(b *testing.B) {
for i := 0; i < b.N; i++ {
param("hi", gslice)
}
}
func BenchmarkGlobal(b *testing.B) {
for i := 0; i < b.N; i++ {
global("hi")
}
}
Example output:
testing: warning: no tests to run
PASS
BenchmarkParameter-2 30000000 55.4 ns/op
BenchmarkGlobal-2 30000000 55.1 ns/op
ok _/V_/workspace/IczaGo/src/play 3.569s
答案2
得分: 2
在@icza的优秀答案基础上,还有另一种将切片作为参数传递的方法:传递切片的指针。
当你需要修改底层切片时,全局变量切片可以工作,但将其作为参数传递不起作用,你实际上是在使用副本。为了缓解这个问题,可以将切片作为指针传递。
有趣的是,这实际上比访问全局变量更快:
package main
import (
"testing"
)
var gslice = make([]string, 1000000)
func global(s string) {
for i := 0; i < 100; i++ { // 循环多次访问切片
_ = s
_ = gslice // 访问全局切片
}
}
func param(s string, ss []string) {
for i := 0; i < 100; i++ { // 循环多次访问切片
_ = s
_ = ss // 访问参数切片
}
}
func paramPointer(s string, ss *[]string) {
for i := 0; i < 100; i++ { // 循环多次访问切片
_ = s
_ = ss // 访问参数切片
}
}
func BenchmarkParameter(b *testing.B) {
for i := 0; i < b.N; i++ {
param("hi", gslice)
}
}
func BenchmarkParameterPointer(b *testing.B) {
for i := 0; i < b.N; i++ {
paramPointer("hi", &gslice)
}
}
func BenchmarkGlobal(b *testing.B) {
for i := 0; i < b.N; i++ {
global("hi")
}
}
结果:
goos: darwin
goarch: amd64
pkg: untitled
BenchmarkParameter-8 24437403 48.2 ns/op
BenchmarkParameterPointer-8 27483115 40.3 ns/op
BenchmarkGlobal-8 25631470 46.0 ns/op
英文:
Piggybacking on @icza's excellent answer, there is another way to pass a slice as parameter: a pointer to a slice.
When you need to modify the underlying slice, the global variable slice works, but passing it as a parameter does not work, you are effectively working with a copy. To mitigate that, one can actually pass the slice as a pointer.
Interesting enough, it's actually faster than accessing a global variable:
package main
import (
"testing"
)
var gslice = make([]string, 1000000)
func global(s string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = gslice // Access global-slice
}
}
func param(s string, ss []string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func paramPointer(s string, ss *[]string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func BenchmarkParameter(b *testing.B) {
for i := 0; i < b.N; i++ {
param("hi", gslice)
}
}
func BenchmarkParameterPointer(b *testing.B) {
for i := 0; i < b.N; i++ {
paramPointer("hi", &gslice)
}
}
func BenchmarkGlobal(b *testing.B) {
for i := 0; i < b.N; i++ {
global("hi")
}
}
Results:
goos: darwin
goarch: amd64
pkg: untitled
BenchmarkParameter-8 24437403 48.2 ns/op
BenchmarkParameterPointer-8 27483115 40.3 ns/op
BenchmarkGlobal-8 25631470 46.0 ns/op
答案3
得分: 2
我重写了基准测试,这样你就可以比较结果了。
从结果可以看出,在10个记录之后,ParameterPointer的性能开始超越其他方法,这非常有趣。
package slices_bench
import (
"testing"
)
var gslice = make([]string, 1000)
func global(s string) {
for i := 0; i < 100; i++ { // 循环多次访问切片
_ = s
_ = gslice // 访问全局切片
}
}
func param(s string, ss []string) {
for i := 0; i < 100; i++ { // 循环多次访问切片
_ = s
_ = ss // 访问参数切片
}
}
func paramPointer(s string, ss *[]string) {
for i := 0; i < 100; i++ { // 循环多次访问切片
_ = s
_ = ss // 访问参数切片
}
}
func BenchmarkPerformance(b *testing.B){
fixture := []struct {
desc string
records int
}{
{
desc: "1个记录",
records: 1,
},
{
desc: "10个记录",
records: 10,
},
{
desc: "100个记录",
records: 100,
},
{
desc: "1000个记录",
records: 1000,
},
{
desc: "10000个记录",
records: 10000,
},
{
desc: "100000个记录",
records: 100000,
},
}
tests := []struct {
desc string
fn func(b *testing.B, n int)
}{
{
desc: "ParameterPointer",
fn: func(b *testing.B, n int) {
for j := 0; j < n; j++ {
paramPointer("hi", &gslice)
}
},
},
{
desc: "Parameter",
fn: func(b *testing.B, n int) {
for j := 0; j < n; j++ {
param("hi", gslice)
}
},
},
{
desc: "Global",
fn: func(b *testing.B, n int) {
for j := 0; j < n; j++ {
global("hi")
}
},
},
}
for _, t := range tests {
b.Run(t.desc, func(b *testing.B) {
for _, f := range fixture {
b.Run(f.desc, func(b *testing.B) {
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
t.fn(b, f.records)
}
})
}
})
}
}
结果:
goos: windows
goarch: amd64
pkg: benchs/slices-bench
cpu: Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
BenchmarkPerformance
BenchmarkPerformance/ParameterPointer
BenchmarkPerformance/ParameterPointer/1_record
BenchmarkPerformance/ParameterPointer/1_record-16 38661910 31.18 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/10_records
BenchmarkPerformance/ParameterPointer/10_records-16 4160023 288.4 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/100_records
BenchmarkPerformance/ParameterPointer/100_records-16 445131 2748 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/1000_records
BenchmarkPerformance/ParameterPointer/1000_records-16 43876 27380 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/10000_records
BenchmarkPerformance/ParameterPointer/10000_records-16 4441 273922 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/100000_records
BenchmarkPerformance/ParameterPointer/100000_records-16 439 2739282 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter
BenchmarkPerformance/Parameter/1_record
BenchmarkPerformance/Parameter/1_record-16 39860619 30.79 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/10_records
BenchmarkPerformance/Parameter/10_records-16 4152728 288.6 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/100_records
BenchmarkPerformance/Parameter/100_records-16 445634 2757 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/1000_records
BenchmarkPerformance/Parameter/1000_records-16 43618 27496 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/10000_records
BenchmarkPerformance/Parameter/10000_records-16 4450 273960 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/100000_records
BenchmarkPerformance/Parameter/100000_records-16 435 2739053 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global
BenchmarkPerformance/Global/1_record
BenchmarkPerformance/Global/1_record-16 38813095 30.97 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/10_records
BenchmarkPerformance/Global/10_records-16 4148433 288.4 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/100_records
BenchmarkPerformance/Global/100_records-16 429274 2758 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/1000_records
BenchmarkPerformance/Global/1000_records-16 43591 27412 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/10000_records
BenchmarkPerformance/Global/10000_records-16 4521 274420 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/100000_records
BenchmarkPerformance/Global/100000_records-16 436 2751042 ns/op 0 B/op 0 allocs/op
英文:
I rewrote the benchmark so you can compare the results.
As you can see the ParameterPointer bench start to get ahead after 10 records. Which is very interesting.
package slices_bench
import (
"testing"
)
var gslice = make([]string, 1000)
func global(s string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = gslice // Access global-slice
}
}
func param(s string, ss []string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func paramPointer(s string, ss *[]string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func BenchmarkPerformance(b *testing.B){
fixture := []struct {
desc string
records int
}{
{
desc: "1 record",
records: 1,
},
{
desc: "10 records",
records: 10,
},
{
desc: "100 records",
records: 100,
},
{
desc: "1000 records",
records: 1000,
},
{
desc: "10000 records",
records: 10000,
},
{
desc: "100000 records",
records: 100000,
},
}
tests := []struct {
desc string
fn func(b *testing.B, n int)
}{
{
desc: "ParameterPointer",
fn: func(b *testing.B, n int) {
for j := 0; j < n; j++ {
paramPointer("hi", &gslice)
}
},
},
{
desc: "Parameter",
fn: func(b *testing.B, n int) {
for j := 0; j < n; j++ {
param("hi", gslice)
}
},
},
{
desc: "Global",
fn: func(b *testing.B, n int) {
for j := 0; j < n; j++ {
global("hi")
}
},
},
}
for _, t := range tests {
b.Run(t.desc, func(b *testing.B) {
for _, f := range fixture {
b.Run(f.desc, func(b *testing.B) {
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
t.fn(b, f.records)
}
})
}
})
}
}
Results:
goos: windows
goarch: amd64
pkg: benchs/slices-bench
cpu: Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
BenchmarkPerformance
BenchmarkPerformance/ParameterPointer
BenchmarkPerformance/ParameterPointer/1_record
BenchmarkPerformance/ParameterPointer/1_record-16 38661910 31.18 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/10_records
BenchmarkPerformance/ParameterPointer/10_records-16 4160023 288.4 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/100_records
BenchmarkPerformance/ParameterPointer/100_records-16 445131 2748 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/1000_records
BenchmarkPerformance/ParameterPointer/1000_records-16 43876 27380 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/10000_records
BenchmarkPerformance/ParameterPointer/10000_records-16 4441 273922 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/100000_records
BenchmarkPerformance/ParameterPointer/100000_records-16 439 2739282 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter
BenchmarkPerformance/Parameter/1_record
BenchmarkPerformance/Parameter/1_record-16 39860619 30.79 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/10_records
BenchmarkPerformance/Parameter/10_records-16 4152728 288.6 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/100_records
BenchmarkPerformance/Parameter/100_records-16 445634 2757 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/1000_records
BenchmarkPerformance/Parameter/1000_records-16 43618 27496 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/10000_records
BenchmarkPerformance/Parameter/10000_records-16 4450 273960 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/100000_records
BenchmarkPerformance/Parameter/100000_records-16 435 2739053 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global
BenchmarkPerformance/Global/1_record
BenchmarkPerformance/Global/1_record-16 38813095 30.97 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/10_records
BenchmarkPerformance/Global/10_records-16 4148433 288.4 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/100_records
BenchmarkPerformance/Global/100_records-16 429274 2758 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/1000_records
BenchmarkPerformance/Global/1000_records-16 43591 27412 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/10000_records
BenchmarkPerformance/Global/10000_records-16 4521 274420 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/100000_records
BenchmarkPerformance/Global/100000_records-16 436 2751042 ns/op 0 B/op 0 allocs/op
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论