英文:
binary.Read does not handle struct padding as expected
问题
在最近的一个Go项目中,我需要读取由Python生成的二进制数据文件,但由于填充的原因,Go中的binary.Read
无法正确读取它。以下是我问题的一个最小示例。
我处理的结构体格式如下:
type Index struct{
A int32
B int32
C int32
D int64
}
如你所见,结构体的大小为4+4+4+8=20,但由于对齐的原因,Python额外添加了4个字节。因此,实际大小为24。
以下是我用于写入此结构体的可运行Python代码:
#!/usr/bin/env python
# encoding=utf8
import struct
if __name__ == '__main__':
data = range(1, 13)
format = 'iiiq' * 3
content = struct.pack(format, *data)
with open('index.bin', 'wb') as f:
f.write(content)
iiiq
格式表示结构体中有三个32位整数和一个64位整数,与我之前定义的Index
结构体相同。运行此代码将生成一个名为index.bin
的文件,大小为72,等于24 * 3。
以下是我用于读取index.bin
的Go代码:
package main
import (
"encoding/binary"
"fmt"
"os"
"io"
"unsafe"
)
type Index struct {
A int32
B int32
C int32
D int64
}
func main() {
indexSize := unsafe.Sizeof(Index{})
fp, _ := os.Open("index.bin")
defer fp.Close()
info, _ := fp.Stat()
fileSize := info.Size()
entryCnt := fileSize / int64(indexSize)
fmt.Printf("entry cnt: %d\n", entryCnt)
readSlice := make([]Index, entryCnt)
reader := io.Reader(fp)
_ = binary.Read(reader, binary.LittleEndian, &readSlice)
fmt.Printf("After read:\n%#v\n", readSlice)
}
这是输出结果:
entry cnt: 3
After read:
[]main.Index{main.Index{A:1, B:2, C:3, D:17179869184}, main.Index{A:0, B:5, C:6, D:7}, main.Index{A:8, B:0, C:9, D:47244640266}}
很明显,从Python生成的文件中读取时输出结果混乱。
所以我的问题是,我如何正确地在Go中读取带有填充的Python生成文件?
英文:
In a recent Go project I need to read a binary data file generated by Python, but due to padding, binary.Read
in Go doesn't read it properly. Below is a minimal example of my problem.
The struct I deal with if of the following format
type Index struct{
A int32
B int32
C int32
D int64
}
As you can see the size of the struct is 4+4+4+8=20, but Python added an extra 4 bytes for alignment. So the size is actually 24.
Below is the runnable Python code I use to write this struct:
#!/usr/bin/env python
# encoding=utf8
import struct
if __name__ == '__main__':
data = range(1, 13)
format = 'iiiq' * 3
content = struct.pack(format, *data)
with open('index.bin', 'wb') as f:
f.write(content)
the iiiq
format means there are three 32 bit integers and one 64 bit integer in the struct, which is the same with the Index
struct I defined earlier. And running this code will generate a file named index.bin
of size 72, which equals to 24 * 3.
And below is the Go code I use to read index.bin
:
package main
import (
"encoding/binary"
"fmt"
"os"
"io"
"unsafe"
)
type Index struct {
A int32
B int32
C int32
D int64
}
func main() {
indexSize := unsafe.Sizeof(Index{})
fp, _ := os.Open("index.bin")
defer fp.Close()
info, _ := fp.Stat()
fileSize := info.Size()
entryCnt := fileSize / int64(indexSize)
fmt.Printf("entry cnt: %d\n", entryCnt)
readSlice := make([]Index, entryCnt)
reader := io.Reader(fp)
_ = binary.Read(reader, binary.LittleEndian, &readSlice)
fmt.Printf("After read:\n%#v\n", readSlice)
}
And this is the output:
entry cnt: 3
After read:
[]main.Index{main.Index{A:1, B:2, C:3, D:17179869184}, main.Index{A:0, B:5, C:6, D:7}, main.Index{A:8, B:0, C:9, D:47244640266}}
Obviously the output is messed up when reading from the Python generated file.
So my question is, how can I read the python generated file(with padding) in Go properly?
答案1
得分: 8
你可以通过填充你的Go结构体来匹配:
type Index struct {
A int32
B int32
C int32
_ int32
D int64
}
这样就可以得到:
[]main.Index{main.Index{A:1, B:2, C:3, _:0, D:4}, main.Index{A:5, B:6, C:7, _:0, D:8}, main.Index{A:9, B:10, C:11, _:0, D:12}}
binary.Read
会跳过 _
字段:
当读取到结构体时,具有空白(_)字段名的字段数据将被跳过;也就是说,空白字段名可以用于填充。
(因此,_
的值为 0
并不是因为文件中的填充被设置为零,而是因为结构体字段被初始化为 0
并且从未更改,文件中的填充被跳过而不是读取。)
英文:
You can just pad your Go struct to match:
type Index struct {
A int32
B int32
C int32
_ int32
D int64
}
Which produces:
[]main.Index{main.Index{A:1, B:2, C:3, _:0, D:4}, main.Index{A:5, B:6, C:7, _:0, D:8}, main.Index{A:9, B:10, C:11, _:0, D:12}}
binary.Read
knows to skip the _
field:
> When reading into structs, the field data for fields with blank (_) field names is skipped; i.e., blank field names may be used for padding.
(So the 0
values for _
are not because the padding in the file was set to zero, but because the struct field was initialized to 0
and never changed, and the padding in the file was skipped rather than read.)
答案2
得分: 1
例如,
package main
import (
"bufio"
"encoding/binary"
"fmt"
"io"
"os"
)
type Index struct {
A int32
B int32
C int32
D int64
}
func readIndex(r io.Reader) (Index, error) {
var index Index
var buf [24]byte
_, err := io.ReadFull(r, buf[:])
if err != nil {
return index, err
}
index.A = int32(binary.LittleEndian.Uint32(buf[0:4]))
index.B = int32(binary.LittleEndian.Uint32(buf[4:8]))
index.C = int32(binary.LittleEndian.Uint32(buf[8:12]))
index.D = int64(binary.LittleEndian.Uint64(buf[16:24]))
return index, nil
}
func main() {
f, err := os.Open("index.bin")
if err != nil {
fmt.Fprintln(os.Stderr, err)
return
}
defer f.Close()
r := bufio.NewReader(f)
indexes := make([]Index, 0, 1024)
for {
index, err := readIndex(r)
if err != nil {
if err == io.EOF {
break
}
fmt.Fprintln(os.Stderr, err)
return
}
indexes = append(indexes, index)
}
fmt.Println(indexes)
}
输出:
[{1 2 3 4} {5 6 7 8} {9 10 11 12}]
输入:
00000000 01 00 00 00 02 00 00 00 03 00 00 00 00 00 00 00 |................|
00000010 04 00 00 00 00 00 00 00 05 00 00 00 06 00 00 00 |................|
00000020 07 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 |................|
00000030 09 00 00 00 0a 00 00 00 0b 00 00 00 00 00 00 00 |................|
00000040 0c 00 00 00 00 00 00 00 |........|
英文:
For example,
package main
import (
"bufio"
"encoding/binary"
"fmt"
"io"
"os"
)
type Index struct {
A int32
B int32
C int32
D int64
}
func readIndex(r io.Reader) (Index, error) {
var index Index
var buf [24]byte
_, err := io.ReadFull(r, buf[:])
if err != nil {
return index, err
}
index.A = int32(binary.LittleEndian.Uint32(buf[0:4]))
index.B = int32(binary.LittleEndian.Uint32(buf[4:8]))
index.C = int32(binary.LittleEndian.Uint32(buf[8:12]))
index.D = int64(binary.LittleEndian.Uint64(buf[16:24]))
return index, nil
}
func main() {
f, err := os.Open("index.bin")
if err != nil {
fmt.Fprintln(os.Stderr, err)
return
}
defer f.Close()
r := bufio.NewReader(f)
indexes := make([]Index, 0, 1024)
for {
index, err := readIndex(r)
if err != nil {
if err == io.EOF {
break
}
fmt.Fprintln(os.Stderr, err)
return
}
indexes = append(indexes, index)
}
fmt.Println(indexes)
}
Output:
[{1 2 3 4} {5 6 7 8} {9 10 11 12}]
Input:
00000000 01 00 00 00 02 00 00 00 03 00 00 00 00 00 00 00 |................|
00000010 04 00 00 00 00 00 00 00 05 00 00 00 06 00 00 00 |................|
00000020 07 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 |................|
00000030 09 00 00 00 0a 00 00 00 0b 00 00 00 00 00 00 00 |................|
00000040 0c 00 00 00 00 00 00 00 |........|
答案3
得分: -1
@Barber的解决方案是可行的,但我发现添加填充字段不太方便。我找到了一种更好的方法。
下面是新的golang读取代码,它可以完美运行:
package main
import (
"fmt"
"os"
"io"
"io/ioutil"
"unsafe"
)
type Index struct {
A int32
B int32
C int32
// Pad int32
D int64
}
func main() {
indexSize := unsafe.Sizeof(Index{})
fp, _ := os.Open("index.bin")
defer fp.Close()
info, _ := fp.Stat()
fileSize := info.Size()
entryCnt := fileSize / int64(indexSize)
reader := io.Reader(fp)
allBytes, _ := ioutil.ReadAll(reader)
readSlice := *((*[]Index)(unsafe.Pointer(&allBytes)))
realLen := len(allBytes) / int(indexSize)
readSlice = readSlice[:realLen]
fmt.Printf("After read:\n%#v\n", readSlice)
}
输出:
After read:
[]main.Index{main.Index{A:1, B:2, C:3, D:4}, main.Index{A:5, B:6, C:7, D:8}, main.Index{A:9, B:10, C:11, D:12}}
这个解决方案不需要显式的填充字段。
关键在于,如果你让golang将整个字节转换为Index
结构体的切片,它似乎能够很好地处理填充。
英文:
@Barber's solution is workable, but I found adding a padding field not so comfortable. And I found a better way of doing it.
Below is the new golang read code which works perfectly:
package main
import (
"fmt"
"os"
"io"
"io/ioutil"
"unsafe"
)
type Index struct {
A int32
B int32
C int32
// Pad int32
D int64
}
func main() {
indexSize := unsafe.Sizeof(Index{})
fp, _ := os.Open("index.bin")
defer fp.Close()
info, _ := fp.Stat()
fileSize := info.Size()
entryCnt := fileSize / int64(indexSize)
reader := io.Reader(fp)
allBytes, _ := ioutil.ReadAll(reader)
readSlice := *((*[]Index)(unsafe.Pointer(&allBytes)))
realLen := len(allBytes) / int(indexSize)
readSlice = readSlice[:realLen]
fmt.Printf("After read:\n%#v\n", readSlice)
}
Output:
After read:
[]main.Index{main.Index{A:1, B:2, C:3, D:4}, main.Index{A:5, B:6, C:7, D:8}, main.Index{A:9, B:10, C:11, D:12}}
This solution needs no explicit padding field.
The essence here is that if you let golang convert the whole bytes to a slice of Index
struct, it seems to be able to handle the padding well.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论