英文:
How to generate a Youtube ID in Go?
问题
我假设我只需要将2^64编码为base64,就可以得到一个11个字符的YouTube标识符。我创建了一个Go程序,链接如下:https://play.golang.org/p/2nuA3JxVMd0
但是它有两个问题:
- 标识符有12个字符,而不是预期的11个字符。
- 编码的base64后缀是"=",这意味着它没有足够的内容进行编码?
那么我错在哪里?
英文:
I'm assuming all I need to do is encode 2^64 as base64 to get a 11 character Youtube identifier. I created a Go program https://play.golang.org/p/2nuA3JxVMd0
package main
import (
"crypto/rand"
"encoding/base64"
"encoding/binary"
"fmt"
"math"
"math/big"
"strings"
)
func main() {
// For example Youtube uses 11 characters of base64.
// How many base64 characters would it require to express a 2^64 number? 2^6^x = 2^64 .. so x = 64/6 = 10.666666666 … i.e. eleven rounded up.
// Generate a 64 bit number
val, _ := randint64()
fmt.Println(val)
// Encode the 64 bit number
b := make([]byte, 8)
binary.LittleEndian.PutUint64(b, uint64(val))
encoded := base64.StdEncoding.EncodeToString([]byte(b))
fmt.Println(encoded, len(encoded))
// https://youtu.be/gocwRvLhDf8?t=75
ytid := strings.ReplaceAll(encoded, "+", "-")
ytid = strings.ReplaceAll(ytid, "/", "_")
fmt.Println("Youtube ID from 64 bit number:", ytid)
}
func randint64() (int64, error) {
val, err := rand.Int(rand.Reader, big.NewInt(int64(math.MaxInt64)))
if err != nil {
return 0, err
}
return val.Int64(), nil
}
But it has two issues:
- The identifier is 12 characters instead of the expected 11
- The encoded base64 suffix is "=" which means that it didn't have enough to encode?
So where am I going wrong?
答案1
得分: 2
一个8字节的int64
(无论其值如何)始终会编码为11
个base64字节,后跟一个单独的填充字节=
,因此您可以可靠地使用以下方法获取您的11
字符YouTubeID
:
var replacer = strings.NewReplacer(
"+", "-",
"/", "_",
)
ytid := replacer.Replace(encoded[:11])
或者(感谢@Crowman和@Peter),可以在不使用填充的情况下进行编码,并且不替换+
和/
,使用base64.RawURLEncoding:
//encoded := base64.StdEncoding.EncodeToString(b) // may include + or /
ytid := base64.RawURLEncoding.EncodeToString(b) // produces URL-friendly - and _
https://play.golang.org/p/AjlvtfR7RWD
一个字节(即8位)的Base64输出传递了6位的输入。因此,根据特定输入确定输出字节数的公式是:
out = in * 8 / 6
或者
out = in * 4 / 3
使用除数3
,这将导致在某些情况下部分使用输出字节。如果输入字节长度为:
- 可以被3整除 - 最后一个字节位于字节边界上
- 不能被3整除 - 最后一个字节不在字节边界上,需要填充
对于8个字节的输入:
out = 8 * 4 / 3 = 10 2/3
将使用10
个完全利用的输出base64字节 - 和一个部分字节(用于2/3
) - 因此有11
个base64字节加上填充以指示有多少浪费的位。
填充通过=
字符表示,=
的数量表示“浪费”的位数:
浪费 填充
===== =======
0
1/3 =
2/3 ==
由于输出产生了10 2/3
个使用的字节 - 那么有1/3
个字节是“浪费”的,因此填充是一个单独的=
。
因此,对8个输入字节进行base64编码将始终产生11
个base64字节,后跟一个单独的=
填充字符,总共产生12
个字节。
英文:
tl;dr
An 8-byte int64
(no matter what value) will always encode to 11
base64 bytes followed by a single padded byte =
, so you can reliably do this to get your 11
character YouTubeID
:
var replacer = strings.NewReplacer(
"+", "-",
"/", "_",
)
ytid := replacer.Replace(encoded[:11])
or (H/T @Crowman & @Peter) one can encode without padding & without replacing +
and /
with base64.RawURLEncoding:
//encoded := base64.StdEncoding.EncodeToString(b) // may include + or /
ytid := base64.RawURLEncoding.EncodeToString(b) // produces URL-friendly - and _
https://play.golang.org/p/AjlvtfR7RWD
One byte
(i.e. 8-bits) of Base64 output conveys 6-bits of input. So the formula to determine the number of output bytes given a certain inputs is:
out = in * 8 / 6
or
out = in * 4 / 3
With a devisor of 3
this will lead to partial use of output bytes in some cases. If the input bytes length is:
- divisible by 3 - the final byte lands on a byte boundary
- not divisible by 3 - the final byte is not on a byte-boundary and requires padding
In the case of 8 bytes of input:
out = 8 * 4 / 3 = 10 2/3
will utilize 10
fully utilized output base64 bytes - and one partial byte (for the 2/3
) - so 11
base64 bytes plus padding to indicate how many wasted bits.
Padding is indicated via the =
character and the number of =
indicates the number of "wasted" bits:
waste padding
===== =======
0
1/3 =
2/3 ==
Since the output produces 10 2/3
used bytes - then 1/3
bytes were "wasted" so the padding is a single =
So base64 encoding 8
input bytes will always produce 11
base64 bytes followed by a single =
padding character to produce 12
bytes in total.
答案2
得分: 1
在base64中,=
是用作填充字符的,但在64位数字中,这种填充是额外的,并不需要12个字符。为什么呢?
看一下Encoding.Encode
函数的源代码:
func (enc *Encoding) Encode(dst, src []byte) {
if len(src) == 0 {
return
}
// enc is a pointer receiver, so the use of enc.encode within the hot
// loop below means a nil check at every operation. Lift that nil check
// outside of the loop to speed up the encoder.
_ = enc.encode
di, si := 0, 0
n := (len(src) / 3) * 3
//https://golang.org/src/encoding/base64/base64.go
}
在(len(src) / 3) * 3
这部分中,使用了3
而不是6
。
所以,这个函数的输出总是具有偶数长度的字符串,如果你的输入总是64位,你可以在编码后删除=
,然后在解码时再添加上。
for i := 8; i <= 18; i++ {
b := make([]byte, i)
binary.LittleEndian.PutUint64(b, uint64(0))
encoded := base64.StdEncoding.EncodeToString(b)
fmt.Println(encoded)
}
输出结果为:
AAAAAAAAAAA=
AAAAAAAAAAAA
AAAAAAAAAAAAAA==
AAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAA==
AAAAAAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAA==
AAAAAAAAAAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAAAAAAAAAA
我所说的6(或3)是什么意思?
base64使用64个字符,每个字符映射到一个值(从000000到111111)。
例子:
一个64位值(uint64):
11154013587666973726
二进制表示:
1001101011001011000001000100001011110000110001010011010000011110
将每六位分组:
001001,101011,001011,000001,000100,001011,110000,110001,010011,010000,011110
J, r, L, B, E, L, w, x, T, Q, e
英文:
=
in base64 is padding, but in 64-bit numbers, this padding is extra and does not require 12 characters, but why?
see Encoding.Encode
function source:
func (enc *Encoding) Encode(dst, src []byte) {
if len(src) == 0 {
return
}
// enc is a pointer receiver, so the use of enc.encode within the hot
// loop below means a nil check at every operation. Lift that nil check
// outside of the loop to speed up the encoder.
_ = enc.encode
di, si := 0, 0
n := (len(src) / 3) * 3
//https://golang.org/src/encoding/base64/base64.go
in this (len(src) / 3) * 3
part , used 3
instead of 6
so output of this function always is string with even length, if your input is always 64-bit, you can delete =
after encoding and add it again for decoding.
for i := 8; i <= 18; i++ {
b := make([]byte, i)
binary.LittleEndian.PutUint64(b, uint64(0))
encoded := base64.StdEncoding.EncodeToString(b)
fmt.Println(encoded)
}
AAAAAAAAAAA=
AAAAAAAAAAAA
AAAAAAAAAAAAAA==
AAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAA==
AAAAAAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAA==
AAAAAAAAAAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAAAAAAAAAA
What do I mean by 6 (or 3)?
base64 use 64 character, each character map to one value (from 000000 to 111111)
example:
a 64bit value (uint64):
11154013587666973726
binary representation:
1001101011001011000001000100001011110000110001010011010000011110
split each six digit:
001001,101011,001011,000001,000100,001011,110000,110001,010011,010000,011110
J, r, L, B, E, L, w, x, T, Q, e
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论