如何在golang中实现可变字节编码算法

huangapple go评论119阅读模式
英文:

How to implement variable-byte encoding algorithm in golang

问题

我正在进行一些整数压缩的工作。
我已经在C++中实现了可变字节编码算法(请参见下面的代码片段)。

我想知道如何在Go语言中实现它,因为我无法像memcpy()那样在内存中在int类型和stringtune类型之间进行转换。

然后,我发现encoding/binary包中的binary.Write()函数可以进行序列化工作,它可以将uint8编码为一个字节,uint16编码为2个字节,uint32编码为4个字节,依此类推。

但是如何使用只有3个字节的方法来编码一个介于2097152和268435456之间的整数?

是否有类似于代码片段的转换方法?

  1. void encode(int value, char* code_list, int& len) {
  2. int bit_value = 0;
  3. int bit_num = 0;
  4. if (value < 128) {
  5. bit_num = 1;
  6. } else if (value < 16384) {
  7. bit_num = 2;
  8. bit_value = 1;
  9. } else if (value < 2097152) {
  10. bit_num = 3;
  11. bit_value = 3;
  12. } else {
  13. bit_num = 4;
  14. bit_value = 7;
  15. }
  16. value <<= bit_num;
  17. value += bit_value;
  18. memcpy(code_list + len, (char*) &value, bit_num);
  19. len += bit_num;
  20. }
英文:

I'm doing some work with integer compression.
I've implemented variable-byte encoding algorithm in c++ (see the snippet below).

I wonder how to implement it in golang since I cannot convert string or tune
type between int type in memory as memcpy() does.

Then, I've figured out binary.Write() in package encoding/binary can do the serializing work, which can encode uint8 into one byte, unint16 into 2 bytes, uint32 in 4 types and so on.

But how to encode a integer, which is between 2097152 and 268435456, using only 3 bytes ?

Is there any similar converting method like the snippet ?

  1. void encode(int value, char* code_list, int&amp; len) {
  2. int bit_value = 0;
  3. int bit_num = 0;
  4. if (value &lt; 128) {
  5. bit_num = 1;
  6. } else if (value &lt; 16384) {
  7. bit_num = 2;
  8. bit_value = 1;
  9. } else if (value &lt; 2097152) {
  10. bit_num = 3;
  11. bit_value = 3;
  12. } else {
  13. bit_num = 4;
  14. bit_value = 7;
  15. }
  16. value &lt;&lt;= bit_num;
  17. value += bit_value;
  18. memcpy(code_list + len, (char*) &amp;value, bit_num);
  19. len += bit_num;
  20. }

答案1

得分: 3

您的编码方式是,第一个字节中最低有效位为1的位数告诉您编码值有多少个字节。

以下是您代码的Go语言实现,它避免了依赖字节序(您的C版本有依赖),并使用了io.Writer而不是像memcpy那样的函数。

您可以在以下链接中运行代码:https://play.golang.org/p/jr0NypSnlW

  1. package main
  2. import (
  3. "fmt"
  4. "bytes"
  5. "io"
  6. )
  7. func encode(w io.Writer, n uint64) error {
  8. bytes := 0
  9. switch {
  10. case n < 128:
  11. bytes = 1
  12. n = (n << 1)
  13. case n < 16834:
  14. bytes = 2
  15. n = (n << 2) | 1
  16. case n < 2097152:
  17. bytes = 3
  18. n = (n << 3) | 3
  19. default:
  20. bytes = 4
  21. n = (n << 4) | 7
  22. }
  23. d := [4]byte{
  24. byte(n), byte(n>>8), byte(n>>16), byte(n>>24),
  25. }
  26. _, err := w.Write(d[:bytes])
  27. return err
  28. }
  29. func main() {
  30. xs := []uint64{0, 32, 20003, 60006, 300009}
  31. var b bytes.Buffer
  32. for _, x := range xs {
  33. if err := encode(&b, x); err != nil {
  34. panic(err)
  35. }
  36. }
  37. fmt.Println(b.Bytes())
  38. }
英文:

Your encoding is such that the count of least-significant 1 bits in the first byte tells you how many bytes the encoded value has.

Here's a Go implementation of your code, that avoids depending on endianness (which your C version does), and uses an io.Writer rather than something like memcpy.

See it run at: https://play.golang.org/p/jr0NypSnlW

  1. package main
  2. import (
  3. &quot;fmt&quot;
  4. &quot;bytes&quot;
  5. &quot;io&quot;
  6. )
  7. func encode(w io.Writer, n uint64) error {
  8. bytes := 0
  9. switch {
  10. case n &lt; 128:
  11. bytes = 1
  12. n = (n &lt;&lt; 1)
  13. case n &lt; 16834:
  14. bytes = 2
  15. n = (n &lt;&lt; 2) | 1
  16. case n &lt; 2097152:
  17. bytes = 3
  18. n = (n &lt;&lt; 3) | 3
  19. default:
  20. bytes = 4
  21. n = (n &lt;&lt; 4) | 7
  22. }
  23. d := [4]byte{
  24. byte(n), byte(n&gt;&gt;8), byte(n&gt;&gt;16), byte(n&gt;&gt;24),
  25. }
  26. _, err := w.Write(d[:bytes])
  27. return err
  28. }
  29. func main() {
  30. xs := []uint64{0, 32, 20003, 60006, 300009}
  31. var b bytes.Buffer
  32. for _, x := range xs {
  33. if err := encode(&amp;b, x); err != nil {
  34. panic(err)
  35. }
  36. }
  37. fmt.Println(b.Bytes())
  38. }

huangapple
  • 本文由 发表于 2017年9月5日 11:15:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/46046491.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定