2015年6月5日 09:20:01go评论105阅读模式

英文:

binary.Read does not handle struct padding as expected

问题

在最近的一个Go项目中，我需要读取由Python生成的二进制数据文件，但由于填充的原因，Go中的binary.Read无法正确读取它。以下是我问题的一个最小示例。

我处理的结构体格式如下：

type Index struct{
    A int32
    B int32
    C int32
    D int64
}

如你所见，结构体的大小为4+4+4+8=20，但由于对齐的原因，Python额外添加了4个字节。因此，实际大小为24。

以下是我用于写入此结构体的可运行Python代码：

#!/usr/bin/env python
# encoding=utf8
import struct
if __name__ == '__main__':
    data = range(1, 13)
    format = 'iiiq' * 3
    content = struct.pack(format, *data)
    with open('index.bin', 'wb') as f:
        f.write(content)

iiiq格式表示结构体中有三个32位整数和一个64位整数，与我之前定义的Index结构体相同。运行此代码将生成一个名为index.bin的文件，大小为72，等于24 * 3。

以下是我用于读取index.bin的Go代码：

package main
import (
        "encoding/binary"
        "fmt"
        "os"
        "io"
        "unsafe"
)
type Index struct {
        A int32
        B int32
        C int32
        D int64
}
func main() {
        indexSize := unsafe.Sizeof(Index{})
        fp, _ := os.Open("index.bin")
        defer fp.Close()
        info, _ := fp.Stat()
        fileSize := info.Size()
        entryCnt := fileSize / int64(indexSize)
        fmt.Printf("entry cnt: %d\n", entryCnt)
        readSlice := make([]Index, entryCnt)
        reader := io.Reader(fp)
        _ = binary.Read(reader, binary.LittleEndian, &readSlice)
        fmt.Printf("After read:\n%#v\n", readSlice)
}

这是输出结果：

entry cnt: 3
After read:
[]main.Index{main.Index{A:1, B:2, C:3, D:17179869184}, main.Index{A:0, B:5, C:6, D:7}, main.Index{A:8, B:0, C:9, D:47244640266}}

很明显，从Python生成的文件中读取时输出结果混乱。

所以我的问题是，我如何正确地在Go中读取带有填充的Python生成文件？

英文:

In a recent Go project I need to read a binary data file generated by Python, but due to padding, binary.Read in Go doesn't read it properly. Below is a minimal example of my problem.

The struct I deal with if of the following format

type Index struct{
    A int32
    B int32
    C int32
    D int64
}

As you can see the size of the struct is 4+4+4+8=20, but Python added an extra 4 bytes for alignment. So the size is actually 24.

Below is the runnable Python code I use to write this struct:

#!/usr/bin/env python
# encoding=utf8
import struct
if __name__ == &#39;__main__&#39;:
    data = range(1, 13)
    format = &#39;iiiq&#39; * 3
    content = struct.pack(format, *data)
    with open(&#39;index.bin&#39;, &#39;wb&#39;) as f:
        f.write(content)

the iiiq format means there are three 32 bit integers and one 64 bit integer in the struct, which is the same with the Index struct I defined earlier. And running this code will generate a file named index.bin of size 72, which equals to 24 * 3.

And below is the Go code I use to read index.bin:

package main
import (
        &quot;encoding/binary&quot;
        &quot;fmt&quot;
        &quot;os&quot;
        &quot;io&quot;
        &quot;unsafe&quot;
)
type Index struct {
        A int32
        B int32
        C int32
        D int64
}
func main() {
        indexSize := unsafe.Sizeof(Index{})
        fp, _ := os.Open(&quot;index.bin&quot;)
        defer fp.Close()
        info, _ := fp.Stat()
        fileSize := info.Size()
        entryCnt := fileSize / int64(indexSize)
        fmt.Printf(&quot;entry cnt: %d\n&quot;, entryCnt)
        readSlice := make([]Index, entryCnt)
        reader := io.Reader(fp)
        _ = binary.Read(reader, binary.LittleEndian, &amp;readSlice)
        fmt.Printf(&quot;After read:\n%#v\n&quot;, readSlice)
}

And this is the output:

entry cnt: 3
After read:
[]main.Index{main.Index{A:1, B:2, C:3, D:17179869184}, main.Index{A:0, B:5, C:6, D:7}, main.Index{A:8, B:0, C:9, D:47244640266}}

Obviously the output is messed up when reading from the Python generated file.

So my question is, how can I read the python generated file(with padding) in Go properly?

答案1

得分: 8

你可以通过填充你的Go结构体来匹配：

type Index struct {
    A int32
    B int32
    C int32
    _ int32
    D int64
}

这样就可以得到：

[]main.Index{main.Index{A:1, B:2, C:3, _:0, D:4}, main.Index{A:5, B:6, C:7, _:0, D:8}, main.Index{A:9, B:10, C:11, _:0, D:12}}

binary.Read 会跳过 _ 字段：

当读取到结构体时，具有空白（_）字段名的字段数据将被跳过；也就是说，空白字段名可以用于填充。

（因此，_ 的值为 0 并不是因为文件中的填充被设置为零，而是因为结构体字段被初始化为 0 并且从未更改，文件中的填充被跳过而不是读取。）

英文:

You can just pad your Go struct to match:

type Index struct {
	A int32
	B int32
	C int32
	_ int32
	D int64
}

Which produces:

[]main.Index{main.Index{A:1, B:2, C:3, _:0, D:4}, main.Index{A:5, B:6, C:7, _:0, D:8}, main.Index{A:9, B:10, C:11, _:0, D:12}}

binary.Read knows to skip the _ field:

> When reading into structs, the field data for fields with blank (_) field names is skipped; i.e., blank field names may be used for padding.

(So the 0 values for _ are not because the padding in the file was set to zero, but because the struct field was initialized to 0 and never changed, and the padding in the file was skipped rather than read.)

答案2

得分: 1

例如，

package main
import (
    "bufio"
    "encoding/binary"
    "fmt"
    "io"
    "os"
)
type Index struct {
    A int32
    B int32
    C int32
    D int64
}
func readIndex(r io.Reader) (Index, error) {
    var index Index
    var buf [24]byte
    _, err := io.ReadFull(r, buf[:])
    if err != nil {
        return index, err
    }
    index.A = int32(binary.LittleEndian.Uint32(buf[0:4]))
    index.B = int32(binary.LittleEndian.Uint32(buf[4:8]))
    index.C = int32(binary.LittleEndian.Uint32(buf[8:12]))
    index.D = int64(binary.LittleEndian.Uint64(buf[16:24]))
    return index, nil
}
func main() {
    f, err := os.Open("index.bin")
    if err != nil {
        fmt.Fprintln(os.Stderr, err)
        return
    }
    defer f.Close()
    r := bufio.NewReader(f)
    indexes := make([]Index, 0, 1024)
    for {
        index, err := readIndex(r)
        if err != nil {
            if err == io.EOF {
                break
            }
            fmt.Fprintln(os.Stderr, err)
            return
        }
        indexes = append(indexes, index)
    }
    fmt.Println(indexes)
}

输出：

[{1 2 3 4} {5 6 7 8} {9 10 11 12}]

输入：

00000000  01 00 00 00 02 00 00 00  03 00 00 00 00 00 00 00  |................|
00000010  04 00 00 00 00 00 00 00  05 00 00 00 06 00 00 00  |................|
00000020  07 00 00 00 00 00 00 00  08 00 00 00 00 00 00 00  |................|
00000030  09 00 00 00 0a 00 00 00  0b 00 00 00 00 00 00 00  |................|
00000040  0c 00 00 00 00 00 00 00                           |........|

英文:

For example,

package main
import (
	&quot;bufio&quot;
	&quot;encoding/binary&quot;
	&quot;fmt&quot;
	&quot;io&quot;
	&quot;os&quot;
)
type Index struct {
	A int32
	B int32
	C int32
	D int64
}
func readIndex(r io.Reader) (Index, error) {
	var index Index
	var buf [24]byte
	_, err := io.ReadFull(r, buf[:])
	if err != nil {
		return index, err
	}
	index.A = int32(binary.LittleEndian.Uint32(buf[0:4]))
	index.B = int32(binary.LittleEndian.Uint32(buf[4:8]))
	index.C = int32(binary.LittleEndian.Uint32(buf[8:12]))
	index.D = int64(binary.LittleEndian.Uint64(buf[16:24]))
	return index, nil
}
func main() {
	f, err := os.Open(&quot;index.bin&quot;)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
		return
	}
	defer f.Close()
	r := bufio.NewReader(f)
	indexes := make([]Index, 0, 1024)
	for {
		index, err := readIndex(r)
		if err != nil {
			if err == io.EOF {
				break
			}
			fmt.Fprintln(os.Stderr, err)
			return
		}
		indexes = append(indexes, index)
	}
	fmt.Println(indexes)
}

Output:

[{1 2 3 4} {5 6 7 8} {9 10 11 12}]

Input:

00000000  01 00 00 00 02 00 00 00  03 00 00 00 00 00 00 00  |................|
00000010  04 00 00 00 00 00 00 00  05 00 00 00 06 00 00 00  |................|
00000020  07 00 00 00 00 00 00 00  08 00 00 00 00 00 00 00  |................|
00000030  09 00 00 00 0a 00 00 00  0b 00 00 00 00 00 00 00  |................|
00000040  0c 00 00 00 00 00 00 00                           |........|

答案3

得分: -1

@Barber的解决方案是可行的，但我发现添加填充字段不太方便。我找到了一种更好的方法。

下面是新的golang读取代码，它可以完美运行：

package main
import (
	"fmt"
	"os"
	"io"
	"io/ioutil"
	"unsafe"
)
type Index struct {
	A int32
	B int32
	C int32
	// Pad int32
	D int64
}
func main() {
	indexSize := unsafe.Sizeof(Index{})
	fp, _ := os.Open("index.bin")
	defer fp.Close()
	info, _ := fp.Stat()
	fileSize := info.Size()
	entryCnt := fileSize / int64(indexSize)
	reader := io.Reader(fp)
	allBytes, _ := ioutil.ReadAll(reader)
	readSlice := *((*[]Index)(unsafe.Pointer(&allBytes)))
	realLen := len(allBytes) / int(indexSize)
	readSlice = readSlice[:realLen]
	fmt.Printf("After read:\n%#v\n", readSlice)
}

输出：

After read:
[]main.Index{main.Index{A:1, B:2, C:3, D:4}, main.Index{A:5, B:6, C:7, D:8}, main.Index{A:9, B:10, C:11, D:12}}

这个解决方案不需要显式的填充字段。

关键在于，如果你让golang将整个字节转换为Index结构体的切片，它似乎能够很好地处理填充。

英文:

@Barber's solution is workable, but I found adding a padding field not so comfortable. And I found a better way of doing it.

Below is the new golang read code which works perfectly:

package main
import (
	&quot;fmt&quot;
	&quot;os&quot;
	&quot;io&quot;
	&quot;io/ioutil&quot;
	&quot;unsafe&quot;
)
type Index struct {
	A int32
	B int32
	C int32
	// Pad int32
	D int64
}
func main() {
	indexSize := unsafe.Sizeof(Index{})
	fp, _ := os.Open(&quot;index.bin&quot;)
	defer fp.Close()
	info, _ := fp.Stat()
	fileSize := info.Size()
	entryCnt := fileSize / int64(indexSize)
	reader := io.Reader(fp)
	allBytes, _ := ioutil.ReadAll(reader)
	readSlice := *((*[]Index)(unsafe.Pointer(&amp;allBytes)))
	realLen := len(allBytes) / int(indexSize)
	readSlice = readSlice[:realLen]
	fmt.Printf(&quot;After read:\n%#v\n&quot;, readSlice)
}

Output:

After read:
[]main.Index{main.Index{A:1, B:2, C:3, D:4}, main.Index{A:5, B:6, C:7, D:8}, main.Index{A:9, B:10, C:11, D:12}}

This solution needs no explicit padding field.

The essence here is that if you let golang convert the whole bytes to a slice of Index struct, it seems to be able to handle the padding well.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

binary.Read无法按预期处理结构体的填充。

问题

答案1

答案2

答案3

golang：select语句中的通道有时只接收到数据（???）

为什么我在调用BindPFlag时会出现空指针错误，这取决于我调用的位置？

如何禁用“不允许使用内部包”错误提示？

我应该提交 Godeps/_workspace 目录还是只提交 Godeps.json 文件就足够了？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。