英文:
How to implement variable-byte encoding algorithm in golang
问题
我正在进行一些整数压缩的工作。
我已经在C++中实现了可变字节编码算法(请参见下面的代码片段)。
我想知道如何在Go语言中实现它,因为我无法像memcpy()
那样在内存中在int
类型和string
或tune
类型之间进行转换。
然后,我发现encoding/binary
包中的binary.Write()
函数可以进行序列化工作,它可以将uint8编码为一个字节,uint16编码为2个字节,uint32编码为4个字节,依此类推。
但是如何使用只有3个字节的方法来编码一个介于2097152和268435456之间的整数?
是否有类似于代码片段的转换方法?
void encode(int value, char* code_list, int& len) {
int bit_value = 0;
int bit_num = 0;
if (value < 128) {
bit_num = 1;
} else if (value < 16384) {
bit_num = 2;
bit_value = 1;
} else if (value < 2097152) {
bit_num = 3;
bit_value = 3;
} else {
bit_num = 4;
bit_value = 7;
}
value <<= bit_num;
value += bit_value;
memcpy(code_list + len, (char*) &value, bit_num);
len += bit_num;
}
英文:
I'm doing some work with integer compression.
I've implemented variable-byte encoding algorithm in c++ (see the snippet below).
I wonder how to implement it in golang since I cannot convert string
or tune
type between int
type in memory as memcpy()
does.
Then, I've figured out binary.Write()
in package encoding/binary
can do the serializing work, which can encode uint8 into one byte, unint16 into 2 bytes, uint32 in 4 types and so on.
But how to encode a integer, which is between 2097152 and 268435456, using only 3 bytes ?
Is there any similar converting method like the snippet ?
void encode(int value, char* code_list, int& len) {
int bit_value = 0;
int bit_num = 0;
if (value < 128) {
bit_num = 1;
} else if (value < 16384) {
bit_num = 2;
bit_value = 1;
} else if (value < 2097152) {
bit_num = 3;
bit_value = 3;
} else {
bit_num = 4;
bit_value = 7;
}
value <<= bit_num;
value += bit_value;
memcpy(code_list + len, (char*) &value, bit_num);
len += bit_num;
}
答案1
得分: 3
您的编码方式是,第一个字节中最低有效位为1的位数告诉您编码值有多少个字节。
以下是您代码的Go语言实现,它避免了依赖字节序(您的C版本有依赖),并使用了io.Writer
而不是像memcpy
那样的函数。
您可以在以下链接中运行代码:https://play.golang.org/p/jr0NypSnlW
package main
import (
"fmt"
"bytes"
"io"
)
func encode(w io.Writer, n uint64) error {
bytes := 0
switch {
case n < 128:
bytes = 1
n = (n << 1)
case n < 16834:
bytes = 2
n = (n << 2) | 1
case n < 2097152:
bytes = 3
n = (n << 3) | 3
default:
bytes = 4
n = (n << 4) | 7
}
d := [4]byte{
byte(n), byte(n>>8), byte(n>>16), byte(n>>24),
}
_, err := w.Write(d[:bytes])
return err
}
func main() {
xs := []uint64{0, 32, 20003, 60006, 300009}
var b bytes.Buffer
for _, x := range xs {
if err := encode(&b, x); err != nil {
panic(err)
}
}
fmt.Println(b.Bytes())
}
英文:
Your encoding is such that the count of least-significant 1
bits in the first byte tells you how many bytes the encoded value has.
Here's a Go implementation of your code, that avoids depending on endianness (which your C version does), and uses an io.Writer
rather than something like memcpy
.
See it run at: https://play.golang.org/p/jr0NypSnlW
package main
import (
"fmt"
"bytes"
"io"
)
func encode(w io.Writer, n uint64) error {
bytes := 0
switch {
case n < 128:
bytes = 1
n = (n << 1)
case n < 16834:
bytes = 2
n = (n << 2) | 1
case n < 2097152:
bytes = 3
n = (n << 3) | 3
default:
bytes = 4
n = (n << 4) | 7
}
d := [4]byte{
byte(n), byte(n>>8), byte(n>>16), byte(n>>24),
}
_, err := w.Write(d[:bytes])
return err
}
func main() {
xs := []uint64{0, 32, 20003, 60006, 300009}
var b bytes.Buffer
for _, x := range xs {
if err := encode(&b, x); err != nil {
panic(err)
}
}
fmt.Println(b.Bytes())
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论