2013年4月6日 02:44:53go评论83阅读模式

英文:

How to be definite about the number of whitespace fmt.Fscanf consumes?

问题

我正在尝试在Go中实现一个PPM解码器。PPM是一种图像格式，由纯文本头部和一些二进制图像数据组成。头部的格式如下（来自规范）：

每个PPM图像由以下内容组成：

用于标识文件类型的“魔术数字”。PPM图像的魔术数字是两个字符“P6”。

空白（空格、制表符、回车、换行）。

以十进制ASCII字符格式化的宽度。

空白。

以十进制ASCII字符格式化的高度。

空白。

最大颜色值（Maxval），同样以十进制ASCII字符格式化。必须小于65536且大于零。

单个空白字符（通常是换行符）。

我尝试使用fmt.Fscanf函数解码这个头部。以下对fmt.Fscanf的调用解析了头部（不考虑下面解释的注意事项）：

var magic string
var width, height, maxVal uint

fmt.Fscanf(input, "%2s %d %d %d", &magic, &width, &height, &maxVal)

fmt的文档中说明：

注意：Fscan等函数可能会读取比返回值多一个字符（rune），这意味着调用扫描函数的循环可能会跳过一些输入。这通常只在输入值之间没有空格时才会出现问题。如果提供给Fscan的读取器实现了ReadRune方法，该方法将用于读取字符。如果读取器还实现了UnreadRune方法，该方法将用于保存字符，后续的调用将不会丢失数据。要将ReadRune和UnreadRune方法附加到没有该功能的读取器上，请使用bufio.NewReader。

由于最后一个空白之后紧跟着图像数据的开始，我必须确定fmt.Fscanf在读取MaxVal后消耗了多少个空白字符。我的代码必须适用于调用者提供的任何读取器，并且其中的一部分不能读取超过头部末尾的内容，因此将内容包装到缓冲读取器中不是一个选项；缓冲读取器可能会从输入中读取比我实际想要读取的更多内容。

一些测试表明，在末尾解析一个虚拟字符可以解决这些问题：

var magic string
var width, height, maxVal uint
var dummy byte

fmt.Fscanf(input, "%2s %d %d %d%c", &magic, &width, &height, &maxVal, &dummy)

这是否保证按照规范工作？

英文:

I am trying to implement a PPM decoder in Go. PPM is an image format that consists of a plaintext header and then some binary image data. The header looks like this (from the spec):

> Each PPM image consists of the following:
>
> 1. A "magic number" for identifying the file type. A ppm image's magic number is the two characters "P6".
> 2. Whitespace (blanks, TABs, CRs, LFs).
> 3. A width, formatted as ASCII characters in decimal.
> 4. Whitespace.
> 5. A height, again in ASCII decimal.
> 6. Whitespace.
> 7. The maximum color value (Maxval), again in ASCII decimal. Must be less than 65536 and more than zero.
>8. A single whitespace character (usually a newline).

I try to decode this header with the fmt.Fscanf function. The following call to
fmt.Fscanf parses the header (not addressing the caveat explained below):

var magic string
var width, height, maxVal uint

fmt.Fscanf(input,&quot;%2s %d %d %d&quot;,&amp;magic,&amp;width,&amp;height,&amp;maxVal)

The documentation of fmt states:

> Note: Fscan etc. can read one character (rune) past the input they
> return, which means that a loop calling a scan routine may skip some
> of the input. This is usually a problem only when there is no space
> between input values. If the reader provided to Fscan implements
> ReadRune, that method will be used to read characters. If the reader
> also implements UnreadRune, that method will be used to save the
> character and successive calls will not lose data. To attach ReadRune
> and UnreadRune methods to a reader without that capability, use
> bufio.NewReader.

As the very next character after the final whitespace is already the beginning of the image data, I have to be certain about how many whitespace fmt.Fscanf did consume after reading MaxVal. My code must work on whatever reader the was provided by the caller and parts of it must not read past the end of the header, therefore wrapping stuff into a buffered reader is not an option; the buffered reader might read more from the input than I actually want to read.

Some testing suggests that parsing a dummy character at the end solves the issues:

var magic string
var width, height, maxVal uint
var dummy byte

fmt.Fscanf(input,&quot;%2s %d %d %d%c&quot;,&amp;magic,&amp;width,&amp;height,&amp;maxVal,&amp;dummy)

Is that guaranteed to work according to the specification?

答案1

得分: 1

不，我不认为那是安全的。虽然它现在可以工作，但文档说明该函数保留了读取超过一个字符的值的权利，除非你有一个UnreadRune()方法。

通过将你的读取器包装在bufio.Reader中，你可以确保读取器具有一个UnreadRune()方法。然后你需要自己读取最后的空白字符。

buf := bufio.NewReader(input)
fmt.Fscanf(buf,"%2s %d %d %d",&magic,&width,&height,&maxVal)
buf.ReadRune() // 从缓冲区中移除下一个字符（空白字符）。

<br>
编辑：

正如我们在聊天中讨论的那样，你可以假设虚拟字符方法有效，并编写一个测试，以便在它停止工作时得知。测试可以是这样的：

func TestFmtBehavior(t *testing.T) {
    // 使用multireader来防止r实现io.RuneScanner
    r := io.MultiReader(bytes.NewReader([]byte("data  ")))

    n, err := fmt.Fscanf(r, "%s%c", new(string), new(byte))
    if n != 2 || err != nil {
        t.Error("failed scan", n, err)
    }

    // 虚拟字符读取了比"data"多一个字符。
    // 仍然应该有一个字节剩下
    if n, err := r.Read(make([]byte, 5)); n != 1 {
        t.Error("assertion failed", n, err)
    }
}

英文:

No, I would not consider that safe. While it works now, the documentation states that the function reserves the right to read past the value by one character unless you have an UnreadRune() method.

By wrapping your reader in a bufio.Reader, you can ensure the reader has an UnreadRune() method. You will then need to read the final whitespace yourself.

buf := bufio.NewReader(input)
fmt.Fscanf(buf,&quot;%2s %d %d %d&quot;,&amp;magic,&amp;width,&amp;height,&amp;maxVal)
buf.ReadRune() // remove next rune (the whitespace) from the buffer.

<br>
Edit:

As we discussed in the chat, you can assume the dummy char method works and then write a test so you know when it stops working. The test can be something like:

func TestFmtBehavior(t *testing.T) {
    // use multireader to prevent r from implementing io.RuneScanner
	r := io.MultiReader(bytes.NewReader([]byte(&quot;data  &quot;)))

	n, err := fmt.Fscanf(r, &quot;%s%c&quot;, new(string), new(byte))
	if n != 2 || err != nil {
		t.Error(&quot;failed scan&quot;, n, err)
	}

    // the dummy char read 1 extra char past &quot;data&quot;.
    // one byte should still remain
	if n, err := r.Read(make([]byte, 5)); n != 1 {
		t.Error(&quot;assertion failed&quot;, n, err)
	}
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何确定空格 fmt.Fscanf 消耗的数量？

问题

答案1

在Go语言中进行Lambda多部分/文件上传

如何使用goreq接收复杂的JSON数据？

为什么filepath.Walk示例中使用了排除Windows和Plan9的构建标签？

Way to pass data up to parent middleware?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论