在Go语言中,字符串变量的拼接速度是多少?

huangapple go评论88阅读模式
英文:

Speed of Concatenation of string variables in Go

问题

我在这个网站上看到一个关于字符串拼接速度的问题。在那个主题中,人们写了一些带有奇怪数字的临时基准测试。https://stackoverflow.com/questions/1760757/how-to-efficiently-concatenate-strings-in-go

我决定检查这些速度并编写了一个测试。我的测试结果与其他方法不同。在处理大型数据时,"+"运算符比其他方法更快。这是正确的吗?

这是我的代码。

package main

import (
    "bytes"
    "fmt"
    "runtime/debug"
    "time"
)

const variations = 30

var time1, time2 time.Time
var delta, catcher string
var x, deltaSize, k, dataSize, operations uint64
var i, j, x_min uint64
var l int
var delta_byte []byte
var method1Speed, method2Speed, method3Speed, method3ASpeed, method2ASpeed [variations]uint64
var dataTotal [variations]uint64
var tmp []byte

func main() {

    x_min = 2
    operations = 1

    for x = variations; x >= x_min; x = x - 2 {
        deltaSize = 1 << x // 2^x
        dataSize = operations * deltaSize
        dataTotal[x-1] = dataSize

        fmt.Println("Step #", x, "delta=", deltaSize, "op.=", operations, "data=", dataSize)
        fmt.Println("Preparing Data...")
        delta_byte = make([]byte, deltaSize)
        for i = 0; i < deltaSize; i++ {
            delta_byte[i] = 255
        }
        delta = string(delta_byte)

        delta_byte = nil
        catcher = ""
        debug.FreeOSMemory()

        fmt.Println("Testing Method #1...")
        time1 = time.Now()
        for j = 1; j <= operations; j++ {
            //----------------------------
            catcher += delta
            //----------------------------
        }
        time2 = time.Now()
        method1Speed[x-1] = uint64((1000000 * float64(dataSize)) / float64(time2.Sub(time1).Nanoseconds())) // KiB/sec.

        catcher = ""
        debug.FreeOSMemory()

        fmt.Println("Testing Method #2...")
        time1 = time.Now()
        for j = 1; j <= operations; j++ {
            //----------------------------
            stringsJoinViaCopy(&catcher, &catcher, &delta)
            //----------------------------
        }
        time2 = time.Now()
        method2Speed[x-1] = uint64((1000000 * float64(dataSize)) / float64(time2.Sub(time1).Nanoseconds())) // KiB/sec.

        catcher = ""
        debug.FreeOSMemory()

        fmt.Println("Testing Method #3...")
        time1 = time.Now()
        for j = 1; j <= operations; j++ {
            //----------------------------
            stringsJoinViaBuffer(&catcher, &catcher, &delta)
            //----------------------------
        }
        time2 = time.Now()
        method3Speed[x-1] = uint64((1000000 * float64(dataSize)) / float64(time2.Sub(time1).Nanoseconds())) // KiB/sec.

        catcher = ""
        debug.FreeOSMemory()

        fmt.Println("Testing Method #3A...")
        time1 = time.Now()
        buffer := bytes.NewBuffer(nil)
        for j = 1; j <= operations; j++ {
            //----------------------------
            buffer.WriteString(delta)
            //----------------------------
        }
        catcher = buffer.String()
        time2 = time.Now()
        method3ASpeed[x-1] = uint64((1000000 * float64(dataSize)) / float64(time2.Sub(time1).Nanoseconds())) // KiB/sec.

        catcher = ""
        debug.FreeOSMemory()

        fmt.Println("Testing Method #2A...")
        time1 = time.Now()
        tmp = make([]byte, int(operations)*len(delta)) // Cheating (guessing) with size
        l = 0
        for j = 1; j <= operations; j++ {
            //----------------------------
            l += copy(tmp[l:], delta)
            //----------------------------
        }
        catcher = string(tmp)
        time2 = time.Now()
        method2ASpeed[x-1] = uint64((1000000 * float64(dataSize)) / float64(time2.Sub(time1).Nanoseconds())) // KiB/sec.

        catcher = ""
        delta = ""
        debug.FreeOSMemory()

        ///
        operations *= 2
    }

    // Show Results
    fmt.Println("#. ops. Total Data, B. Speed (KiB/sec) M1 M2 M3 M3A M2A")
    for x = x_min; x <= variations; x = x + 2 {
        dataSize = 1 << x // 2^x
        operations = 1 << (variations - x)
        fmt.Println(x, operations, dataTotal[x-1], method1Speed[x-1], method2Speed[x-1], method3Speed[x-1],
            method3ASpeed[x-1], method2ASpeed[x-1])
    }
}

//------------------------------------------------------------------------------

func stringsJoinViaBuffer(dest, a, b *string) {

    // Joins two strings (a & b) using Buffer and puts them into dest.

    buffer := bytes.NewBuffer(nil)
    buffer.WriteString(*a)
    buffer.WriteString(*b)

    *dest = buffer.String()
}

//------------------------------------------------------------------------------

func stringsJoinViaCopy(dest, a, b *string) {
    x := make([]byte, len(*a)+len(*b))
    i := 0
    i += copy(x[i:], *a)
    i += copy(x[i:], *b)

    *dest = string(x)
}

这是结果

#. ops. Total Data, B. Speed (KiB/sec) M1 M2 M3 M3A M2A
2 268435456 65536 236 109 57 108413 301653
4 67108864 131072 464 227 113 251519 576660
6 16777216 262144 895 410 202 225300 626165
8 4194304 524288 1514 672 351 205068 552088
10 1048576 1048576 3187 1412 756 207588 532239
12 262144 2097152 7980 3238 1727 209447 592230
14 65536 4194304 16361 6553 3641 230521 536320
16 16384 8388608 29568 12170 6835 241752 604050
18 4096 16777216 55158 23950 13549 238039 563997
20 1024 33554432 98348 43400 25958 216947 521189
22 256 67108864 168906 80442 48725 231806 534722
24 64 134217728 299127 129035 89686 254403 519534
26 16 268435456 529730 207405 153894 284578 506730
28 4 536870912 1167316 353510 268546 359990 523471
30 1 1073741824 909950698305 503703 581848 572763 579852

看起来当你有大量持续的数据或者可以通过猜测大小来欺骗时,它是有效的...这是正确的吗?如果只有偶尔出现的字符串,简单的"+"运算符更好吗?在提到的问题中,人们测量了字节传输而没有真实的任务。

在第26步中,"+"运算符甚至比通过猜测大小更快!

英文:

I have seen on this website a question about the speed of string concatenation. In that topic people wrote about some ephemeral benchmarks with strange numbers. https://stackoverflow.com/questions/1760757/how-to-efficiently-concatenate-strings-in-go

I have decided to check those speeds and wrote a test. My test shows other results. On big sizes "+" operator is faster than other methods. Is that right?

Here is my code.

package main

import (
    &quot;bytes&quot;
    &quot;fmt&quot;
    &quot;runtime/debug&quot;
    &quot;time&quot;
)

const variations = 30

var time1, time2 time.Time
var delta, catcher string
var x, deltaSize, k, dataSize, operations uint64
var i, j, x_min uint64
var l int
var delta_byte []byte
var method1Speed, method2Speed, method3Speed, method3ASpeed, method2ASpeed [variations]uint64
var dataTotal [variations]uint64
var tmp []byte

func main() {

    x_min = 2
    operations = 1

    for x = variations; x &gt;= x_min; x = x - 2 {
	    deltaSize = 1 &lt;&lt; x // 2^x
	    dataSize = operations * deltaSize
	    dataTotal[x-1] = dataSize

	    fmt.Println(&quot;Step #&quot;, x, &quot;delta=&quot;, deltaSize, &quot;op.=&quot;, operations, &quot;data=&quot;, dataSize)
	    fmt.Println(&quot;Preparing Data...&quot;)
	    delta_byte = make([]byte, deltaSize)
	    for i = 0; i &lt; deltaSize; i++ {
		    delta_byte[i] = 255
	    }
	    delta = string(delta_byte)

	    delta_byte = nil
	    catcher = &quot;&quot;
	    debug.FreeOSMemory()

	    fmt.Println(&quot;Testing Method #1...&quot;)
	    time1 = time.Now()
	    for j = 1; j &lt;= operations; j++ {
		    //----------------------------
		    catcher += delta
		    //----------------------------
	    }
	    time2 = time.Now()
	    method1Speed[x-1] = uint64((1000000 * float64(dataSize)) / float64(time2.Sub(time1).Nanoseconds())) // KiB/sec.

	    catcher = &quot;&quot;
	    debug.FreeOSMemory()

	    fmt.Println(&quot;Testing Method #2...&quot;)
	    time1 = time.Now()
	    for j = 1; j &lt;= operations; j++ {
		    //----------------------------
		    stringsJoinViaCopy(&amp;catcher, &amp;catcher, &amp;delta)
		    //----------------------------
	    }
	    time2 = time.Now()
	    method2Speed[x-1] = uint64((1000000 * float64(dataSize)) / float64(time2.Sub(time1).Nanoseconds())) // KiB/sec.

	    catcher = &quot;&quot;
	    debug.FreeOSMemory()

	    fmt.Println(&quot;Testing Method #3...&quot;)
	    time1 = time.Now()
	    for j = 1; j &lt;= operations; j++ {
		    //----------------------------
		    stringsJoinViaBuffer(&amp;catcher, &amp;catcher, &amp;delta)
		    //----------------------------
	    }
	    time2 = time.Now()
	    method3Speed[x-1] = uint64((1000000 * float64(dataSize)) / float64(time2.Sub(time1).Nanoseconds())) // KiB/sec.

	    catcher = &quot;&quot;
	    debug.FreeOSMemory()

	    fmt.Println(&quot;Testing Method #3A...&quot;)
	    time1 = time.Now()
	    buffer := bytes.NewBuffer(nil)
	    for j = 1; j &lt;= operations; j++ {
		    //----------------------------
		    buffer.WriteString(delta)
		    //----------------------------
	    }
	    catcher = buffer.String()
	    time2 = time.Now()
	    method3ASpeed[x-1] = uint64((1000000 * float64(dataSize)) / float64(time2.Sub(time1).Nanoseconds())) // KiB/sec.

	    catcher = &quot;&quot;
	    debug.FreeOSMemory()

	    fmt.Println(&quot;Testing Method #2A...&quot;)
	    time1 = time.Now()
	    tmp = make([]byte, int(operations)*len(delta)) // Cheating (guessing) with size
	    l = 0
	    for j = 1; j &lt;= operations; j++ {
		    //----------------------------
		    l += copy(tmp[l:], delta)
		    //----------------------------
	    }
	    catcher = string(tmp)
	    time2 = time.Now()
	    method2ASpeed[x-1] = uint64((1000000 * float64(dataSize)) / float64(time2.Sub(time1).Nanoseconds())) // KiB/sec.

	    catcher = &quot;&quot;
	    delta = &quot;&quot;
	    debug.FreeOSMemory()

	    ///
	    operations *= 2
    }

    // Show Results
    fmt.Println(&quot;#. ops. Total Data, B. Speed (KiB/sec) M1 M2 M3 M3A M2A&quot;)
    for x = x_min; x &lt;= variations; x = x + 2 {
	    dataSize = 1 &lt;&lt; x // 2^x
	    operations = 1 &lt;&lt; (variations - x)
	    fmt.Println(x, operations, dataTotal[x-1], method1Speed[x-1], method2Speed[x-1], method3Speed[x-1],
		    method3ASpeed[x-1], method2ASpeed[x-1])
    }
}

//------------------------------------------------------------------------------

func stringsJoinViaBuffer(dest, a, b *string) {

    // Joins two strings (a &amp; b) using Buffer and puts them into dest.

    buffer := bytes.NewBuffer(nil)
    buffer.WriteString(*a)
    buffer.WriteString(*b)

    *dest = buffer.String()
}

//------------------------------------------------------------------------------

func stringsJoinViaCopy(dest, a, b *string) {
    x := make([]byte, len(*a)+len(*b))
    i := 0
    i += copy(x[i:], *a)
    i += copy(x[i:], *b)

    *dest = string(x)
}

Here are results

#. ops. Total Data, B. Speed (KiB/sec) M1 M2 M3 M3A M2A
2 268435456 65536 236 109 57 108413 301653
4 67108864 131072 464 227 113 251519 576660
6 16777216 262144 895 410 202 225300 626165
8 4194304 524288 1514 672 351 205068 552088
10 1048576 1048576 3187 1412 756 207588 532239
12 262144 2097152 7980 3238 1727 209447 592230
14 65536 4194304 16361 6553 3641 230521 536320
16 16384 8388608 29568 12170 6835 241752 604050
18 4096 16777216 55158 23950 13549 238039 563997
20 1024 33554432 98348 43400 25958 216947 521189
22 256 67108864 168906 80442 48725 231806 534722
24 64 134217728 299127 129035 89686 254403 519534
26 16 268435456 529730 207405 153894 284578 506730
28 4 536870912 1167316 353510 268546 359990 523471
30 1 1073741824 909950698305 503703 581848 572763 579852

Seems like it works when you either have a lot of data going constantly or can cheat with size guessing... Is it correct? If there are ocassional strings, simple "+" is better? Somehow, in the mentioned question people measured byte transfers without real-world tasks.

Somehow in the step #26 "+" operator is faster even than cheating with size guessing!

答案1

得分: 4

这是一个Go基准测试的起始套件。

concat_test.go:

package main

import (
	"bytes"
	"strconv"
	"strings"
	"testing"
)

func BenchmarkConcat(b *testing.B) {
	var s string
	for n := 1; n <= 1<<12; n <<= 3 {
		s1 := strings.Repeat("a", n)
		s2 := strings.Repeat("b", n)

		b.Run("PlusL"+strconv.Itoa(n), func(b *testing.B) {
			b.ReportAllocs()
			b.ResetTimer()
			for i := 0; i < b.N; i++ {
				s = s1 + s2
			}
			b.StopTimer()
		},
		)

		b.Run("CopyL"+strconv.Itoa(n), func(b *testing.B) {
			b.ReportAllocs()
			b.ResetTimer()
			for i := 0; i < b.N; i++ {
				buf := make([]byte, len(s1)+len(s2))
				copy(buf[copy(buf, s1):], s2)
				s = string(buf)
			}
			b.StopTimer()
		},
		)

		b.Run("BufferL"+strconv.Itoa(n), func(b *testing.B) {
			b.ReportAllocs()
			b.ResetTimer()
			for i := 0; i < b.N; i++ {
				var buf bytes.Buffer
				buf.WriteString(s1)
				buf.WriteString(s2)
				s = buf.String()
			}
			b.StopTimer()
		},
		)

	}
	_ = s
}

输出:

$ go test -bench=.
goos: linux
goarch: amd64
pkg: so/concat
BenchmarkConcat/PlusL1-4       30000000    55.9 ns/op   2 B/op   1 allocs/op
BenchmarkConcat/CopyL1-4       30000000    63.0 ns/op   4 B/op   2 allocs/op
BenchmarkConcat/BufferL1-4     10000000   115 ns/op  114 B/op   2 allocs/op
BenchmarkConcat/PlusL8-4       20000000    78.1 ns/op  16 B/op   1 allocs/op
BenchmarkConcat/CopyL8-4       20000000    99.2 ns/op  32 B/op   2 allocs/op
BenchmarkConcat/BufferL8-4     10000000   131 ns/op  128 B/op   2 allocs/op
BenchmarkConcat/PlusL64-4      20000000    85.3 ns/op  128 B/op   1 allocs/op
BenchmarkConcat/CopyL64-4      10000000   125 ns/op  256 B/op   2 allocs/op
BenchmarkConcat/BufferL64-4     5000000   328 ns/op  432 B/op   3 allocs/op
BenchmarkConcat/PlusL512-4      5000000   249 ns/op 1024 B/op   1 allocs/op
BenchmarkConcat/CopyL512-4      3000000   457 ns/op 2048 B/op   2 allocs/op
BenchmarkConcat/BufferL512-4    1000000  1012 ns/op 3184 B/op   4 allocs/op
BenchmarkConcat/PlusL4096-4    1000000  1527 ns/op 8192 B/op   1 allocs/op
BenchmarkConcat/CopyL4096-4     500000  3132 ns/op 16384 B/op   2 allocs/op
BenchmarkConcat/BufferL4096-4   300000  4863 ns/op 24688 B/op   4 allocs/op
PASS
ok   so/concat 24.308s
$
英文:

Here's a Go benchmark starter kit.

concat_test.go:

package main

import (
	&quot;bytes&quot;
	&quot;strconv&quot;
	&quot;strings&quot;
	&quot;testing&quot;
)

func BenchmarkConcat(b *testing.B) {
	var s string
	for n := 1; n &lt;= 1&lt;&lt;12; n &lt;&lt;= 3 {
		s1 := strings.Repeat(&quot;a&quot;, n)
		s2 := strings.Repeat(&quot;b&quot;, n)

		b.Run(&quot;PlusL&quot;+strconv.Itoa(n), func(b *testing.B) {
			b.ReportAllocs()
			b.ResetTimer()
			for i := 0; i &lt; b.N; i++ {
				s = s1 + s2
			}
			b.StopTimer()
		},
		)

		b.Run(&quot;CopyL&quot;+strconv.Itoa(n), func(b *testing.B) {
			b.ReportAllocs()
			b.ResetTimer()
			for i := 0; i &lt; b.N; i++ {
				buf := make([]byte, len(s1)+len(s2))
				copy(buf[copy(buf, s1):], s2)
				s = string(buf)
			}
			b.StopTimer()
		},
		)

		b.Run(&quot;BufferL&quot;+strconv.Itoa(n), func(b *testing.B) {
			b.ReportAllocs()
			b.ResetTimer()
			for i := 0; i &lt; b.N; i++ {
				var buf bytes.Buffer
				buf.WriteString(s1)
				buf.WriteString(s2)
				s = buf.String()
			}
			b.StopTimer()
		},
		)

	}
	_ = s
}

Output:

$ go test -bench=.
goos: linux
goarch: amd64
pkg: so/concat
BenchmarkConcat/PlusL1-4       30000000	    55.9 ns/op	   2 B/op	   1 allocs/op
BenchmarkConcat/CopyL1-4       30000000	    63.0 ns/op	   4 B/op	   2 allocs/op
BenchmarkConcat/BufferL1-4     10000000	   115 ns/op	 114 B/op	   2 allocs/op
BenchmarkConcat/PlusL8-4       20000000	    78.1 ns/op	  16 B/op	   1 allocs/op
BenchmarkConcat/CopyL8-4       20000000	    99.2 ns/op	  32 B/op	   2 allocs/op
BenchmarkConcat/BufferL8-4     10000000	   131 ns/op	 128 B/op	   2 allocs/op
BenchmarkConcat/PlusL64-4      20000000	    85.3 ns/op	 128 B/op	   1 allocs/op
BenchmarkConcat/CopyL64-4      10000000	   125 ns/op	 256 B/op	   2 allocs/op
BenchmarkConcat/BufferL64-4     5000000	   328 ns/op	 432 B/op	   3 allocs/op
BenchmarkConcat/PlusL512-4      5000000	   249 ns/op	1024 B/op	   1 allocs/op
BenchmarkConcat/CopyL512-4      3000000	   457 ns/op	2048 B/op	   2 allocs/op
BenchmarkConcat/BufferL512-4    1000000	  1012 ns/op	3184 B/op	   4 allocs/op
BenchmarkConcat/PlusL4096-4     1000000	  1527 ns/op	8192 B/op	   1 allocs/op
BenchmarkConcat/CopyL4096-4      500000	  3132 ns/op   16384 B/op	   2 allocs/op
BenchmarkConcat/BufferL4096-4    300000	  4863 ns/op   24688 B/op	   4 allocs/op
PASS
ok  	so/concat	24.308s
$ 

答案2

得分: 0

你做错的一件事是如何对缓冲区版本进行基准测试。你在每次迭代中都分配一个新的缓冲区,而应该只创建一个缓冲区,并在完成之前不断向其写入,然后才能检索结果。否则,为什么要使用缓冲区呢?

buf := bytes.NewBuffer([]byte(catcher))
for j = 1; j <= operations; j++ {
    //----------------------------
    buf.WriteString(delta)
    //----------------------------
}
catcher = buf.String()

你的stringsJoinViaCopy也不必每次都分配一个新的字节切片。而且,只有当你事先知道字符串的大小时,使用copy才比bytes.Buffer更有意义,因为Buffer已经在底层使用了copy,并且还使用了一些增长启发式算法来管理底层字节切片的大小。

英文:

One thing you're doing wrong is how you're benchmarking the buffer version. You're allocating a new buffer on every iteration, instead you should create a buffer once and keep writing to it until you're done, then you can retrieve your result. Else why use a buffer at all?

buf := bytes.NewBuffer([]byte(catcher))
for j = 1; j &lt;= operations; j++ {
	//----------------------------
	buf.WriteString(delta)
	//----------------------------
}
catcher = buf.String()

Your stringsJoinViaCopy also unnecessarily allocates a new byte slice every single time. And copy anyway makes sense over bytes.Buffer only when you know the size of the string beforehand since Buffer already uses copy underneath, together with some growing heuristic for the underlying byte slice.

huangapple
  • 本文由 发表于 2017年4月18日 06:24:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/43460604.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定