将多个 []byte 连接在一起的最快方法是什么?

huangapple go评论85阅读模式
英文:

What is the fastest way to concatenate several []byte together?

问题

目前,您正在使用下面的代码(如BenchmarkEncoder()中所示),它运行速度很快,但我想知道是否有更快、更高效的方法。您可以使用GOMAXPROCS=1进行基准测试,命令如下:

sudo -E nice -n -20 go test -bench . -benchmem -benchtime 3s
package blackbird

import (
	"testing"
	"encoding/hex"
	"log"
	"bytes"
	"encoding/json"
)

var (
	d1, d2, d3, d4, outBytes []byte
	toEncode [][]byte
)

func init() {
	var err interface{}
	d1, err = hex.DecodeString("6e5438fd9c3748868147d7a4f6d355dd")
	d2, err = hex.DecodeString("0740e2dfa4b049f2beeb29cc304bdb5f")
	d3, err = hex.DecodeString("ab6743272358467caff7d94c3cc58e8c")
	d4, err = hex.DecodeString("7411c080762a47f49e5183af12d87330e6d0df7dd63a44808db4e250cdea0a36182fce4a309842e49f4202eb90184dd5b621d67db4a04940a29e981a5aea59be")
	if err != nil {
		log.Fatal("hex decoding failed: %v", err)
	}
	toEncode = [][]byte{d1, d2, d3, d4}
}

func Encode(stuff [][]byte) []byte {
	return bytes.Join(stuff, nil)
}

func BenchmarkEncoderDirect(b *testing.B) {
	for i := 0; i < b.N; i++ {
		bytes.Join(toEncode, nil)
	}
}

func BenchmarkEncoder(b *testing.B) {
	for i := 0; i < b.N; i++ {
		Encode(toEncode)
	}
}

func BenchmarkJsonEncoder(b *testing.B) {
	for i := 0; i < b.N; i++ {
		outBytes, _ = json.Marshal(toEncode)
	}
}

最快的方法是使用bytes.Join()函数将多个[]byte连接在一起。在BenchmarkEncoderDirect()函数中,您已经使用了这种方法。这是一个高效的方法,可以快速地将多个[]byte连接成一个。

希望对您有所帮助!如果您有任何其他问题,请随时问我。

英文:

Right now I'm using the code below (as in BenchmarkEncoder()) and it's fast, but I'm wondering if there is a faster, more efficient way. I benchmark with GOMAXPROCS=1 and:

sudo -E nice -n -20 go test -bench . -benchmem -benchtime 3s

.

package blackbird

import (
	&quot;testing&quot;
	&quot;encoding/hex&quot;
	&quot;log&quot;
	&quot;bytes&quot;
	&quot;encoding/json&quot;
)

var (
	d1, d2, d3, d4, outBytes []byte
	toEncode [][]byte
)

func init() {
	var err interface{}
	d1, err = hex.DecodeString(&quot;6e5438fd9c3748868147d7a4f6d355dd&quot;)
	d2, err = hex.DecodeString(&quot;0740e2dfa4b049f2beeb29cc304bdb5f&quot;)
	d3, err = hex.DecodeString(&quot;ab6743272358467caff7d94c3cc58e8c&quot;)
	d4, err = hex.DecodeString(&quot;7411c080762a47f49e5183af12d87330e6d0df7dd63a44808db4e250cdea0a36182fce4a309842e49f4202eb90184dd5b621d67db4a04940a29e981a5aea59be&quot;)
	if err != nil {
		log.Fatal(&quot;hex decoding failed: %v&quot;, err)
	}
	toEncode = [][]byte{d1, d2, d3, d4}
	 
}

func Encode(stuff [][]byte) []byte {
	return bytes.Join(stuff, nil)
}

func BenchmarkEncoderDirect(b *testing.B) {
	for i := 0; i &lt; b.N; i++ {
		bytes.Join(toEncode, nil)
	}
}

func BenchmarkEncoder(b *testing.B) {
	for i := 0; i &lt; b.N; i++ {
		Encode(toEncode)
	}
}

func BenchmarkJsonEncoder(b *testing.B) {
	for i := 0; i &lt; b.N; i++ {
		outBytes, _ = json.Marshal(toEncode)
		
	}
}

What is the fastest way to concatenate several []byte together?

答案1

得分: 6

bytes.Join() 是一个非常快速的函数,但它会在可附加的字节切片之间添加分隔符。即使分隔符是一个空的或 nil 的切片,它也会这样做。

所以,如果你关心最佳性能(尽管改进可能很小),你可以像 bytes.Join() 一样做,但不附加(空的)分隔符:分配一个足够大的字节切片,并使用内置的 copy() 函数将每个切片复制到结果中。

Go Playground 上尝试一下:

func Join(s ...[]byte) []byte {
    n := 0
    for _, v := range s {
        n += len(v)
    }

    b, i := make([]byte, n), 0
    for _, v := range s {
        i += copy(b[i:], v)
    }
    return b
}

使用它:

concatenated := Join(d1, d2, d3, d4)

改进:

如果你事先知道总大小(或者你可以比循环切片更快地计算出它),请提供它,这样你就可以避免循环切片以计算所需的大小:

func JoinSize(size int, s ...[]byte) []byte {
    b, i := make([]byte, size), 0
    for _, v := range s {
        i += copy(b[i:], v)
    }
    return b
}

在你的情况下使用它:

concatenated := JoinSize(48 + len(d4), d1, d2, d3, d4)

注意:

但是,如果你最终的目标是将连接的字节切片写入一个 io.Writer 中,从性能上来看,最好不要将它们连接起来,而是分别将它们写入其中。

英文:

bytes.Join() is pretty fast, but it does some extra work appending separators between the appendable byte slices. It does so even if the separator is an empty or nil slice.

So if you care about the best performance (although it will be a slight improvement), you may do what bytes.Join() does without appending (empty) separators: allocate a big-enough byte slice, and copy each slice into the result using the built-in copy() function.

Try it on the Go Playground:

func Join(s ...[]byte) []byte {
	n := 0
	for _, v := range s {
		n += len(v)
	}

	b, i := make([]byte, n), 0
	for _, v := range s {
		i += copy(b[i:], v)
	}
	return b
}

Using it:

concatenated := Join(d1, d2, d3, d4)

Improvements:

If you know the total size in advance (or you can calculate it faster than looping over the slices), provide it and you can avoid having to loop over the slices in order to count the needed size:

func JoinSize(size int, s ...[]byte) []byte {
	b, i := make([]byte, size), 0
	for _, v := range s {
		i += copy(b[i:], v)
	}
	return b
}

Using it in your case:

concatenated := JoinSize(48 + len(d4), d1, d2, d3, d4)

Notes:

But if your goal in the end is to write the concatenated byte slice into an io.Writer, performance wise it is better not to concatenate them but write each into it separately.

答案2

得分: 2

一般来说,@icza的回答是正确的。然而,针对你的特定用例,你可以分配一次内存,并且更高效地将解码结果存储在该缓冲区中:

像这样:

package main

import (
    "encoding/hex"
)

func main() {
    h1 := []byte("6e5438fd9c3748868147d7a4f6d355dd")
    h2 := []byte("0740e2dfa4b049f2beeb29cc304bdb5f")
    h3 := []byte("ab6743272358467caff7d94c3cc58e8c")
    h4 := []byte("7411c080762a47f49e5183af12d87330e6d0df7dd63a44808db4e250cdea0a36182fce4a309842e49f4202eb90184dd5b621d67db4a04940a29e981a5aea59be")

    tg := make([]byte, 16+16+16+(1024*1024)) // 分配足够的空间来存储3个ID和最大1MB的额外数据

    hex.Decode(tg[:16], h1)
    hex.Decode(tg[16:32], h2)
    hex.Decode(tg[32:48], h3)
    l, _ := hex.Decode(tg[48:], h4)

    tg = tg[:48+l]
}

在代码的末尾,tg变量中连续地存储了3个ID以及可变长度的第四个数据块的解码结果。

英文:

In general, @icza's answer is right. For your specific use-case, however, you can allocate once and decode into that buffer much more efficiently:

Like this:

package main

import (
    &quot;encoding/hex&quot;
)

func main() {
    h1 := []byte(&quot;6e5438fd9c3748868147d7a4f6d355dd&quot;)
    h2 := []byte(&quot;0740e2dfa4b049f2beeb29cc304bdb5f&quot;)
    h3 := []byte(&quot;ab6743272358467caff7d94c3cc58e8c&quot;)
    h4 := []byte(&quot;7411c080762a47f49e5183af12d87330e6d0df7dd63a44808db4e250cdea0a36182fce4a309842e49f4202eb90184dd5b621d67db4a04940a29e981a5aea59be&quot;)

    tg := make([]byte, 16+16+16+(1024*1024)) // allocate enough space for the 3 IDs and a max 1MB of extra data

    hex.Decode(tg[:16], h1)
    hex.Decode(tg[16:32], h2)
    hex.Decode(tg[32:48], h3)
    l, _ := hex.Decode(tg[48:], h4)

    tg = tg[:48+l]
}

At the end of that code, tg holds the 3 IDs plus the variable-length 4th chunk of data, decoded, contiguously.

huangapple
  • 本文由 发表于 2015年9月3日 16:27:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/32370615.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定