为什么Base64缓冲区大小比底层文本的长度要大?

huangapple go评论76阅读模式
英文:

Why does Base64 buffer sizing make it larger than the length of the underlying text?

问题

我正在尝试将字节数组编码为Base64,并遇到两个问题。我可以使用base64.StdEncoding.EncodedLen(text)来实现,但我担心这样会很耗费资源,所以我想看看是否可以只用len(text)来实现。以下是代码(这些函数被命名为"Marshal",因为我在JSON编组期间将它们用作字段转换器):

package main

import (
	"crypto/rand"
	"encoding/base64"
	"fmt"
)

func main() {
	b := make([]byte, 60)
	_, _ = rand.Read(b)

	// Marshal Create Dst Buffer
	MarshalTextBuffer(b)

	// Marshal Convert to String
	MarshalTextStringWithBufferLen(b)

	// Marshal Convert to String
	MarshalTextStringWithDecodedLen(b)
}

func MarshalTextBuffer(text []byte) error {
	ba := base64.StdEncoding.EncodeToString(text)
	fmt.Println(ba)
	return nil
}

func MarshalTextStringWithBufferLen(text []byte) error {
	ba := make([]byte, len(text)+30) // 为什么len(text)不够?暂时使用'30',只是为了防止溢出。
	base64.StdEncoding.Encode(ba, text)
	fmt.Println(ba)
	return nil
}

func MarshalTextStringWithDecodedLen(text []byte) error {
	ba := make([]byte, base64.StdEncoding.EncodedLen(len(text)))
	base64.StdEncoding.Encode(ba, text)
	fmt.Println(ba)
	return nil
}

这是输出结果:

IL5CW8T9WSgwU5Hyi9JsLLkU/EcydY6pG2fgLQJsMaXgxhSh74RTagzr6b9yDeZ8CP4Azc8xqq5/+Cgk
[73 76 53 67 87 56 84 57 87 83 103 119 85 53 72 121 105 57 74 115 76 76 107 85 47 69 99 121 100 89 54 112 71 50 102 103 76 81 74 115 77 97 88 103 120 104 83 104 55 52 82 84 97 103 122 114 54 98 57 121 68 101 90 56 67 80 52 65 122 99 56 120 113 113 53 47 43 67 103 107 0 0 0 0 0 0 0 0 0 0]
[73 76 53 67 87 56 84 57 87 83 103 119 85 53 72 121 105 57 74 115 76 76 107 85 47 69 99 121 100 89 54 112 71 50 102 103 76 81 74 115 77 97 88 103 120 104 83 104 55 52 82 84 97 103 122 114 54 98 57 121 68 101 90 56 67 80 52 65 122 99 56 120 113 113 53 47 43 67 103 107]

为什么中间的函数MarshalTextStringWithBufferLen需要额外的填充?

base64.StdEncoding.EncodedLen函数是否耗费资源(例如,我可以使用底部的函数解决问题,但我担心资源消耗)?

英文:

I am trying to encode a byte array as Base64 and running into two issues. I can do this with base64.StdEncoding.EncodedLen(text) but I'm worried that's costly, so I wanted to see if I could do it just with len(text). Here is the code (the functions are named "Marshal" because I'm using them as a field converter during JSON Marshaling):

package main

import (
	"crypto/rand"
	"encoding/base64"
	"fmt"
)

func main() {
	b := make([]byte, 60)
	_, _ = rand.Read(b)

	// Marshal Create Dst Buffer
	MarshalTextBuffer(b)

	// Marshal Convert to String
	MarshalTextStringWithBufferLen(b)

	// Marshal Convert to String
	MarshalTextStringWithDecodedLen(b)
}

func MarshalTextBuffer(text []byte) error {
	ba := base64.StdEncoding.EncodeToString(text)
	fmt.Println(ba)
	return nil
}

func MarshalTextStringWithBufferLen(text []byte) error {
	ba := make([]byte, len(text)+30) // Why does len(text) not suffice? Temporarily using '30' for now, just so it doesn't overrun.
	base64.StdEncoding.Encode(ba, text)
	fmt.Println(ba)
	return nil
}

func MarshalTextStringWithDecodedLen(text []byte) error {
	ba := make([]byte, base64.StdEncoding.EncodedLen(len(text)))
	base64.StdEncoding.Encode(ba, text)
	fmt.Println(ba)
	return nil
}

Here's the output:

IL5CW8T9WSgwU5Hyi9JsLLkU/EcydY6pG2fgLQJsMaXgxhSh74RTagzr6b9yDeZ8CP4Azc8xqq5/+Cgk
[73 76 53 67 87 56 84 57 87 83 103 119 85 53 72 121 105 57 74 115 76 76 107 85 47 69 99 121 100 89 54 112 71 50 102 103 76 81 74 115 77 97 88 103 120 104 83 104 55 52 82 84 97 103 122 114 54 98 57 121 68 101 90 56 67 80 52 65 122 99 56 120 113 113 53 47 43 67 103 107 0 0 0 0 0 0 0 0 0 0]
[73 76 53 67 87 56 84 57 87 83 103 119 85 53 72 121 105 57 74 115 76 76 107 85 47 69 99 121 100 89 54 112 71 50 102 103 76 81 74 115 77 97 88 103 120 104 83 104 55 52 82 84 97 103 122 114 54 98 57 121 68 101 90 56 67 80 52 65 122 99 56 120 113 113 53 47 43 67 103 107]

Why does the middle one MarshalTextStringWithBufferLen require extra padding?

Is base64.StdEncoding.EncodedLen a costly function (e.g. I can solve it with the bottom function, but I worry about the cost).

答案1

得分: 1

Base-64编码将二进制数据(每字节8位)存储为文本(每字节使用6位),因此每3个字节编码为4个字节(3x8 = 4x6)。所以你代码中的len(text) + 30是错误的,应该是len(text)*4/3(如果len(text)可被3整除),但为了可读性和避免错误,你应该使用base64.StdEncoding.EncodedLen()来获取长度。

如果你查看base64.StdEncoding.EncodedLen的代码,你会发现它与自己计算的速度一样快(尤其是它将被内联)。

英文:

Base-64 encoding stores binary data (8 bits per byte) as text (using 6 bits per byte), so every 3 bytes is encoded as 4 bytes (3x8 = 4x6). So len(text) + 30 in your code is wrong, and should be len(text)*4/3 (if len(text) is divisible by 3) but to make for readability and to avoid bugs you should be using base64.StdEncoding.EncodedLen() to get the length.

If you look at the code for base64.StdEncoding.EncodedLen you will see that it is as fast as doing the calcs yourself (esp. as it will be in-lined).

huangapple
  • 本文由 发表于 2022年10月1日 06:30:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/73914511.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定