binary.Read无法按预期处理结构体的填充。

huangapple go评论105阅读模式
英文:

binary.Read does not handle struct padding as expected

问题

在最近的一个Go项目中,我需要读取由Python生成的二进制数据文件,但由于填充的原因,Go中的binary.Read无法正确读取它。以下是我问题的一个最小示例。

我处理的结构体格式如下:

  1. type Index struct{
  2. A int32
  3. B int32
  4. C int32
  5. D int64
  6. }

如你所见,结构体的大小为4+4+4+8=20,但由于对齐的原因,Python额外添加了4个字节。因此,实际大小为24。

以下是我用于写入此结构体的可运行Python代码:

  1. #!/usr/bin/env python
  2. # encoding=utf8
  3. import struct
  4. if __name__ == '__main__':
  5. data = range(1, 13)
  6. format = 'iiiq' * 3
  7. content = struct.pack(format, *data)
  8. with open('index.bin', 'wb') as f:
  9. f.write(content)

iiiq格式表示结构体中有三个32位整数和一个64位整数,与我之前定义的Index结构体相同。运行此代码将生成一个名为index.bin的文件,大小为72,等于24 * 3。

以下是我用于读取index.bin的Go代码:

  1. package main
  2. import (
  3. "encoding/binary"
  4. "fmt"
  5. "os"
  6. "io"
  7. "unsafe"
  8. )
  9. type Index struct {
  10. A int32
  11. B int32
  12. C int32
  13. D int64
  14. }
  15. func main() {
  16. indexSize := unsafe.Sizeof(Index{})
  17. fp, _ := os.Open("index.bin")
  18. defer fp.Close()
  19. info, _ := fp.Stat()
  20. fileSize := info.Size()
  21. entryCnt := fileSize / int64(indexSize)
  22. fmt.Printf("entry cnt: %d\n", entryCnt)
  23. readSlice := make([]Index, entryCnt)
  24. reader := io.Reader(fp)
  25. _ = binary.Read(reader, binary.LittleEndian, &readSlice)
  26. fmt.Printf("After read:\n%#v\n", readSlice)
  27. }

这是输出结果:

  1. entry cnt: 3
  2. After read:
  3. []main.Index{main.Index{A:1, B:2, C:3, D:17179869184}, main.Index{A:0, B:5, C:6, D:7}, main.Index{A:8, B:0, C:9, D:47244640266}}

很明显,从Python生成的文件中读取时输出结果混乱。

所以我的问题是,我如何正确地在Go中读取带有填充的Python生成文件?

英文:

In a recent Go project I need to read a binary data file generated by Python, but due to padding, binary.Read in Go doesn't read it properly. Below is a minimal example of my problem.

The struct I deal with if of the following format

  1. type Index struct{
  2. A int32
  3. B int32
  4. C int32
  5. D int64
  6. }

As you can see the size of the struct is 4+4+4+8=20, but Python added an extra 4 bytes for alignment. So the size is actually 24.

Below is the runnable Python code I use to write this struct:

  1. #!/usr/bin/env python
  2. # encoding=utf8
  3. import struct
  4. if __name__ == '__main__':
  5. data = range(1, 13)
  6. format = 'iiiq' * 3
  7. content = struct.pack(format, *data)
  8. with open('index.bin', 'wb') as f:
  9. f.write(content)

the iiiq format means there are three 32 bit integers and one 64 bit integer in the struct, which is the same with the Index struct I defined earlier. And running this code will generate a file named index.bin of size 72, which equals to 24 * 3.

And below is the Go code I use to read index.bin:

  1. package main
  2. import (
  3. "encoding/binary"
  4. "fmt"
  5. "os"
  6. "io"
  7. "unsafe"
  8. )
  9. type Index struct {
  10. A int32
  11. B int32
  12. C int32
  13. D int64
  14. }
  15. func main() {
  16. indexSize := unsafe.Sizeof(Index{})
  17. fp, _ := os.Open("index.bin")
  18. defer fp.Close()
  19. info, _ := fp.Stat()
  20. fileSize := info.Size()
  21. entryCnt := fileSize / int64(indexSize)
  22. fmt.Printf("entry cnt: %d\n", entryCnt)
  23. readSlice := make([]Index, entryCnt)
  24. reader := io.Reader(fp)
  25. _ = binary.Read(reader, binary.LittleEndian, &readSlice)
  26. fmt.Printf("After read:\n%#v\n", readSlice)
  27. }

And this is the output:

  1. entry cnt: 3
  2. After read:
  3. []main.Index{main.Index{A:1, B:2, C:3, D:17179869184}, main.Index{A:0, B:5, C:6, D:7}, main.Index{A:8, B:0, C:9, D:47244640266}}

Obviously the output is messed up when reading from the Python generated file.

So my question is, how can I read the python generated file(with padding) in Go properly?

答案1

得分: 8

你可以通过填充你的Go结构体来匹配:

  1. type Index struct {
  2. A int32
  3. B int32
  4. C int32
  5. _ int32
  6. D int64
  7. }

这样就可以得到:

  1. []main.Index{main.Index{A:1, B:2, C:3, _:0, D:4}, main.Index{A:5, B:6, C:7, _:0, D:8}, main.Index{A:9, B:10, C:11, _:0, D:12}}

binary.Read 会跳过 _ 字段:

当读取到结构体时,具有空白(_)字段名的字段数据将被跳过;也就是说,空白字段名可以用于填充。

(因此,_ 的值为 0 并不是因为文件中的填充被设置为零,而是因为结构体字段被初始化为 0 并且从未更改,文件中的填充被跳过而不是读取。)

英文:

You can just pad your Go struct to match:

  1. type Index struct {
  2. A int32
  3. B int32
  4. C int32
  5. _ int32
  6. D int64
  7. }

Which produces:

  1. []main.Index{main.Index{A:1, B:2, C:3, _:0, D:4}, main.Index{A:5, B:6, C:7, _:0, D:8}, main.Index{A:9, B:10, C:11, _:0, D:12}}

binary.Read knows to skip the _ field:

> When reading into structs, the field data for fields with blank (_) field names is skipped; i.e., blank field names may be used for padding.

(So the 0 values for _ are not because the padding in the file was set to zero, but because the struct field was initialized to 0 and never changed, and the padding in the file was skipped rather than read.)

答案2

得分: 1

例如,

  1. package main
  2. import (
  3. "bufio"
  4. "encoding/binary"
  5. "fmt"
  6. "io"
  7. "os"
  8. )
  9. type Index struct {
  10. A int32
  11. B int32
  12. C int32
  13. D int64
  14. }
  15. func readIndex(r io.Reader) (Index, error) {
  16. var index Index
  17. var buf [24]byte
  18. _, err := io.ReadFull(r, buf[:])
  19. if err != nil {
  20. return index, err
  21. }
  22. index.A = int32(binary.LittleEndian.Uint32(buf[0:4]))
  23. index.B = int32(binary.LittleEndian.Uint32(buf[4:8]))
  24. index.C = int32(binary.LittleEndian.Uint32(buf[8:12]))
  25. index.D = int64(binary.LittleEndian.Uint64(buf[16:24]))
  26. return index, nil
  27. }
  28. func main() {
  29. f, err := os.Open("index.bin")
  30. if err != nil {
  31. fmt.Fprintln(os.Stderr, err)
  32. return
  33. }
  34. defer f.Close()
  35. r := bufio.NewReader(f)
  36. indexes := make([]Index, 0, 1024)
  37. for {
  38. index, err := readIndex(r)
  39. if err != nil {
  40. if err == io.EOF {
  41. break
  42. }
  43. fmt.Fprintln(os.Stderr, err)
  44. return
  45. }
  46. indexes = append(indexes, index)
  47. }
  48. fmt.Println(indexes)
  49. }

输出:

  1. [{1 2 3 4} {5 6 7 8} {9 10 11 12}]

输入:

  1. 00000000 01 00 00 00 02 00 00 00 03 00 00 00 00 00 00 00 |................|
  2. 00000010 04 00 00 00 00 00 00 00 05 00 00 00 06 00 00 00 |................|
  3. 00000020 07 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 |................|
  4. 00000030 09 00 00 00 0a 00 00 00 0b 00 00 00 00 00 00 00 |................|
  5. 00000040 0c 00 00 00 00 00 00 00 |........|
英文:

For example,

  1. package main
  2. import (
  3. "bufio"
  4. "encoding/binary"
  5. "fmt"
  6. "io"
  7. "os"
  8. )
  9. type Index struct {
  10. A int32
  11. B int32
  12. C int32
  13. D int64
  14. }
  15. func readIndex(r io.Reader) (Index, error) {
  16. var index Index
  17. var buf [24]byte
  18. _, err := io.ReadFull(r, buf[:])
  19. if err != nil {
  20. return index, err
  21. }
  22. index.A = int32(binary.LittleEndian.Uint32(buf[0:4]))
  23. index.B = int32(binary.LittleEndian.Uint32(buf[4:8]))
  24. index.C = int32(binary.LittleEndian.Uint32(buf[8:12]))
  25. index.D = int64(binary.LittleEndian.Uint64(buf[16:24]))
  26. return index, nil
  27. }
  28. func main() {
  29. f, err := os.Open("index.bin")
  30. if err != nil {
  31. fmt.Fprintln(os.Stderr, err)
  32. return
  33. }
  34. defer f.Close()
  35. r := bufio.NewReader(f)
  36. indexes := make([]Index, 0, 1024)
  37. for {
  38. index, err := readIndex(r)
  39. if err != nil {
  40. if err == io.EOF {
  41. break
  42. }
  43. fmt.Fprintln(os.Stderr, err)
  44. return
  45. }
  46. indexes = append(indexes, index)
  47. }
  48. fmt.Println(indexes)
  49. }

Output:

  1. [{1 2 3 4} {5 6 7 8} {9 10 11 12}]

Input:

  1. 00000000 01 00 00 00 02 00 00 00 03 00 00 00 00 00 00 00 |................|
  2. 00000010 04 00 00 00 00 00 00 00 05 00 00 00 06 00 00 00 |................|
  3. 00000020 07 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 |................|
  4. 00000030 09 00 00 00 0a 00 00 00 0b 00 00 00 00 00 00 00 |................|
  5. 00000040 0c 00 00 00 00 00 00 00 |........|

答案3

得分: -1

@Barber的解决方案是可行的,但我发现添加填充字段不太方便。我找到了一种更好的方法。

下面是新的golang读取代码,它可以完美运行:

  1. package main
  2. import (
  3. "fmt"
  4. "os"
  5. "io"
  6. "io/ioutil"
  7. "unsafe"
  8. )
  9. type Index struct {
  10. A int32
  11. B int32
  12. C int32
  13. // Pad int32
  14. D int64
  15. }
  16. func main() {
  17. indexSize := unsafe.Sizeof(Index{})
  18. fp, _ := os.Open("index.bin")
  19. defer fp.Close()
  20. info, _ := fp.Stat()
  21. fileSize := info.Size()
  22. entryCnt := fileSize / int64(indexSize)
  23. reader := io.Reader(fp)
  24. allBytes, _ := ioutil.ReadAll(reader)
  25. readSlice := *((*[]Index)(unsafe.Pointer(&allBytes)))
  26. realLen := len(allBytes) / int(indexSize)
  27. readSlice = readSlice[:realLen]
  28. fmt.Printf("After read:\n%#v\n", readSlice)
  29. }

输出:

  1. After read:
  2. []main.Index{main.Index{A:1, B:2, C:3, D:4}, main.Index{A:5, B:6, C:7, D:8}, main.Index{A:9, B:10, C:11, D:12}}

这个解决方案不需要显式的填充字段。

关键在于,如果你让golang将整个字节转换为Index结构体的切片,它似乎能够很好地处理填充。

英文:

@Barber's solution is workable, but I found adding a padding field not so comfortable. And I found a better way of doing it.

Below is the new golang read code which works perfectly:

  1. package main
  2. import (
  3. "fmt"
  4. "os"
  5. "io"
  6. "io/ioutil"
  7. "unsafe"
  8. )
  9. type Index struct {
  10. A int32
  11. B int32
  12. C int32
  13. // Pad int32
  14. D int64
  15. }
  16. func main() {
  17. indexSize := unsafe.Sizeof(Index{})
  18. fp, _ := os.Open("index.bin")
  19. defer fp.Close()
  20. info, _ := fp.Stat()
  21. fileSize := info.Size()
  22. entryCnt := fileSize / int64(indexSize)
  23. reader := io.Reader(fp)
  24. allBytes, _ := ioutil.ReadAll(reader)
  25. readSlice := *((*[]Index)(unsafe.Pointer(&allBytes)))
  26. realLen := len(allBytes) / int(indexSize)
  27. readSlice = readSlice[:realLen]
  28. fmt.Printf("After read:\n%#v\n", readSlice)
  29. }

Output:

  1. After read:
  2. []main.Index{main.Index{A:1, B:2, C:3, D:4}, main.Index{A:5, B:6, C:7, D:8}, main.Index{A:9, B:10, C:11, D:12}}

This solution needs no explicit padding field.

The essence here is that if you let golang convert the whole bytes to a slice of Index struct, it seems to be able to handle the padding well.

huangapple
  • 本文由 发表于 2015年6月5日 09:20:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/30656844.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定