How to generate a Youtube ID in Go?

huangapple go评论73阅读模式
英文:

How to generate a Youtube ID in Go?

问题

我假设我只需要将2^64编码为base64,就可以得到一个11个字符的YouTube标识符。我创建了一个Go程序,链接如下:https://play.golang.org/p/2nuA3JxVMd0

但是它有两个问题:

  1. 标识符有12个字符,而不是预期的11个字符。
  2. 编码的base64后缀是"=",这意味着它没有足够的内容进行编码?

那么我错在哪里?

英文:

I'm assuming all I need to do is encode 2^64 as base64 to get a 11 character Youtube identifier. I created a Go program https://play.golang.org/p/2nuA3JxVMd0

package main

import (
	"crypto/rand"
	"encoding/base64"
	"encoding/binary"
	"fmt"
	"math"
	"math/big"
	"strings"
)

func main() {

	// For example Youtube uses 11 characters of base64.
	// How many base64 characters would it require to express a 2^64 number? 2^6^x = 2^64 .. so x = 64/6 = 10.666666666 … i.e. eleven rounded up.

	// Generate a 64 bit number
	val, _ := randint64()
	fmt.Println(val)

	// Encode the 64 bit number
	b := make([]byte, 8)
	binary.LittleEndian.PutUint64(b, uint64(val))
	encoded := base64.StdEncoding.EncodeToString([]byte(b))
	fmt.Println(encoded, len(encoded))

	// https://youtu.be/gocwRvLhDf8?t=75
	ytid := strings.ReplaceAll(encoded, "+", "-")
	ytid = strings.ReplaceAll(ytid, "/", "_")
	fmt.Println("Youtube ID from 64 bit number:", ytid)

}

func randint64() (int64, error) {
	val, err := rand.Int(rand.Reader, big.NewInt(int64(math.MaxInt64)))
	if err != nil {
		return 0, err
	}
	return val.Int64(), nil
}

But it has two issues:

  1. The identifier is 12 characters instead of the expected 11
  2. The encoded base64 suffix is "=" which means that it didn't have enough to encode?

So where am I going wrong?

答案1

得分: 2

一个8字节的int64(无论其值如何)始终会编码为11个base64字节,后跟一个单独的填充字节= ,因此您可以可靠地使用以下方法获取您的11字符YouTubeID

var replacer = strings.NewReplacer(
    "+", "-",
    "/", "_",
)

ytid := replacer.Replace(encoded[:11])

或者(感谢@Crowman和@Peter),可以在不使用填充的情况下进行编码,并且不替换+/,使用base64.RawURLEncoding

//encoded := base64.StdEncoding.EncodeToString(b) // may include + or /

ytid := base64.RawURLEncoding.EncodeToString(b)  // produces URL-friendly - and _

https://play.golang.org/p/AjlvtfR7RWD


一个字节(即8位)的Base64输出传递了6位的输入。因此,根据特定输入确定输出字节数的公式是:

out = in * 8 / 6

或者

out = in * 4 / 3

使用除数3,这将导致在某些情况下部分使用输出字节。如果输入字节长度为:

  • 可以被3整除 - 最后一个字节位于字节边界上
  • 不能被3整除 - 最后一个字节不在字节边界上,需要填充

对于8个字节的输入:

out = 8 * 4 / 3 = 10 2/3

将使用10个完全利用的输出base64字节 - 和一个部分字节(用于2/3) - 因此有11个base64字节加上填充以指示有多少浪费的位。

填充通过=字符表示,=的数量表示“浪费”的位数:

浪费    填充
=====   =======   
0       
1/3     =
2/3     ==

由于输出产生了10 2/3个使用的字节 - 那么有1/3个字节是“浪费”的,因此填充是一个单独的=

因此,对8个输入字节进行base64编码将始终产生11个base64字节,后跟一个单独的=填充字符,总共产生12个字节。

英文:

tl;dr

An 8-byte int64 (no matter what value) will always encode to 11 base64 bytes followed by a single padded byte =, so you can reliably do this to get your 11 character YouTubeID:

var replacer = strings.NewReplacer(
"+", "-",
"/", "_",
)
ytid := replacer.Replace(encoded[:11])

or (H/T @Crowman & @Peter) one can encode without padding & without replacing + and / with base64.RawURLEncoding:

//encoded := base64.StdEncoding.EncodeToString(b) // may include + or /
ytid := base64.RawURLEncoding.EncodeToString(b)  // produces URL-friendly - and _

https://play.golang.org/p/AjlvtfR7RWD


One byte (i.e. 8-bits) of Base64 output conveys 6-bits of input. So the formula to determine the number of output bytes given a certain inputs is:

out = in * 8 / 6

or

out = in * 4 / 3

With a devisor of 3 this will lead to partial use of output bytes in some cases. If the input bytes length is:

  • divisible by 3 - the final byte lands on a byte boundary
  • not divisible by 3 - the final byte is not on a byte-boundary and requires padding

In the case of 8 bytes of input:

out = 8 * 4 / 3 = 10 2/3

will utilize 10 fully utilized output base64 bytes - and one partial byte (for the 2/3) - so 11 base64 bytes plus padding to indicate how many wasted bits.

Padding is indicated via the = character and the number of = indicates the number of "wasted" bits:

waste   padding
=====   =======   
0       
1/3     =
2/3     ==

Since the output produces 10 2/3 used bytes - then 1/3 bytes were "wasted" so the padding is a single =

So base64 encoding 8 input bytes will always produce 11 base64 bytes followed by a single = padding character to produce 12 bytes in total.

答案2

得分: 1

在base64中,=是用作填充字符的,但在64位数字中,这种填充是额外的,并不需要12个字符。为什么呢?

看一下Encoding.Encode函数的源代码:

func (enc *Encoding) Encode(dst, src []byte) {
	if len(src) == 0 {
		return
	}
	// enc is a pointer receiver, so the use of enc.encode within the hot
	// loop below means a nil check at every operation. Lift that nil check
	// outside of the loop to speed up the encoder.
	_ = enc.encode

	di, si := 0, 0
	n := (len(src) / 3) * 3
    //https://golang.org/src/encoding/base64/base64.go
}

(len(src) / 3) * 3这部分中,使用了3而不是6

所以,这个函数的输出总是具有偶数长度的字符串,如果你的输入总是64位,你可以在编码后删除=,然后在解码时再添加上。

for i := 8; i <= 18; i++ {
    b := make([]byte, i)
    binary.LittleEndian.PutUint64(b, uint64(0))
    encoded := base64.StdEncoding.EncodeToString(b)
    fmt.Println(encoded)
}

输出结果为:

AAAAAAAAAAA=
AAAAAAAAAAAA
AAAAAAAAAAAAAA==
AAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAA==
AAAAAAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAA==
AAAAAAAAAAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAAAAAAAAAA

我所说的6(或3)是什么意思?

base64使用64个字符,每个字符映射到一个值(从000000到111111)。

例子:

一个64位值(uint64):

11154013587666973726

二进制表示:

1001101011001011000001000100001011110000110001010011010000011110

将每六位分组:

001001,101011,001011,000001,000100,001011,110000,110001,010011,010000,011110

J, r, L, B, E, L, w, x, T, Q, e

英文:

= in base64 is padding, but in 64-bit numbers, this padding is extra and does not require 12 characters, but why?

see Encoding.Encode function source:

func (enc *Encoding) Encode(dst, src []byte) {
	if len(src) == 0 {
		return
	}
	// enc is a pointer receiver, so the use of enc.encode within the hot
	// loop below means a nil check at every operation. Lift that nil check
	// outside of the loop to speed up the encoder.
	_ = enc.encode

	di, si := 0, 0
	n := (len(src) / 3) * 3
    //https://golang.org/src/encoding/base64/base64.go

in this (len(src) / 3) * 3 part , used 3 instead of 6

so output of this function always is string with even length, if your input is always 64-bit, you can delete = after encoding and add it again for decoding.

for i := 8; i &lt;= 18; i++ {
b := make([]byte, i)
binary.LittleEndian.PutUint64(b, uint64(0))
encoded := base64.StdEncoding.EncodeToString(b)
fmt.Println(encoded)
}
AAAAAAAAAAA=
AAAAAAAAAAAA
AAAAAAAAAAAAAA==
AAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAA==
AAAAAAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAA==
AAAAAAAAAAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAAAAAAAAAA

What do I mean by 6 (or 3)?

base64 use 64 character, each character map to one value (from 000000 to 111111)

example:

a 64bit value (uint64):

11154013587666973726

binary representation:

1001101011001011000001000100001011110000110001010011010000011110

split each six digit:

001001,101011,001011,000001,000100,001011,110000,110001,010011,010000,011110

J, r, L, B, E, L, w, x, T, Q, e

huangapple
  • 本文由 发表于 2021年10月22日 18:17:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/69675161.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定