如何确定空格 fmt.Fscanf 消耗的数量?

huangapple go评论83阅读模式
英文:

How to be definite about the number of whitespace fmt.Fscanf consumes?

问题

我正在尝试在Go中实现一个PPM解码器。PPM是一种图像格式,由纯文本头部和一些二进制图像数据组成。头部的格式如下(来自规范):

每个PPM图像由以下内容组成:

  1. 用于标识文件类型的“魔术数字”。PPM图像的魔术数字是两个字符“P6”。
  2. 空白(空格、制表符、回车、换行)。
  3. 以十进制ASCII字符格式化的宽度。
  4. 空白。
  5. 以十进制ASCII字符格式化的高度。
  6. 空白。
  7. 最大颜色值(Maxval),同样以十进制ASCII字符格式化。必须小于65536且大于零。
  8. 单个空白字符(通常是换行符)。

我尝试使用fmt.Fscanf函数解码这个头部。以下对fmt.Fscanf的调用解析了头部(不考虑下面解释的注意事项):

var magic string
var width, height, maxVal uint

fmt.Fscanf(input, "%2s %d %d %d", &magic, &width, &height, &maxVal)

fmt文档中说明:

注意:Fscan等函数可能会读取比返回值多一个字符(rune),这意味着调用扫描函数的循环可能会跳过一些输入。这通常只在输入值之间没有空格时才会出现问题。如果提供给Fscan的读取器实现了ReadRune方法,该方法将用于读取字符。如果读取器还实现了UnreadRune方法,该方法将用于保存字符,后续的调用将不会丢失数据。要将ReadRuneUnreadRune方法附加到没有该功能的读取器上,请使用bufio.NewReader

由于最后一个空白之后紧跟着图像数据的开始,我必须确定fmt.Fscanf在读取MaxVal后消耗了多少个空白字符。我的代码必须适用于调用者提供的任何读取器,并且其中的一部分不能读取超过头部末尾的内容,因此将内容包装到缓冲读取器中不是一个选项;缓冲读取器可能会从输入中读取比我实际想要读取的更多内容。

一些测试表明,在末尾解析一个虚拟字符可以解决这些问题:

var magic string
var width, height, maxVal uint
var dummy byte

fmt.Fscanf(input, "%2s %d %d %d%c", &magic, &width, &height, &maxVal, &dummy)

这是否保证按照规范工作?

英文:

I am trying to implement a PPM decoder in Go. PPM is an image format that consists of a plaintext header and then some binary image data. The header looks like this (from the spec):

> Each PPM image consists of the following:
>
> 1. A "magic number" for identifying the file type. A ppm image's magic number is the two characters "P6".
> 2. Whitespace (blanks, TABs, CRs, LFs).
> 3. A width, formatted as ASCII characters in decimal.
> 4. Whitespace.
> 5. A height, again in ASCII decimal.
> 6. Whitespace.
> 7. The maximum color value (Maxval), again in ASCII decimal. Must be less than 65536 and more than zero.
>8. A single whitespace character (usually a newline).

I try to decode this header with the fmt.Fscanf function. The following call to
fmt.Fscanf parses the header (not addressing the caveat explained below):

var magic string
var width, height, maxVal uint

fmt.Fscanf(input,"%2s %d %d %d",&magic,&width,&height,&maxVal)

The documentation of fmt states:

> Note: Fscan etc. can read one character (rune) past the input they
> return, which means that a loop calling a scan routine may skip some
> of the input. This is usually a problem only when there is no space
> between input values. If the reader provided to Fscan implements
> ReadRune, that method will be used to read characters. If the reader
> also implements UnreadRune, that method will be used to save the
> character and successive calls will not lose data. To attach ReadRune
> and UnreadRune methods to a reader without that capability, use
> bufio.NewReader.

As the very next character after the final whitespace is already the beginning of the image data, I have to be certain about how many whitespace fmt.Fscanf did consume after reading MaxVal. My code must work on whatever reader the was provided by the caller and parts of it must not read past the end of the header, therefore wrapping stuff into a buffered reader is not an option; the buffered reader might read more from the input than I actually want to read.

Some testing suggests that parsing a dummy character at the end solves the issues:

var magic string
var width, height, maxVal uint
var dummy byte

fmt.Fscanf(input,"%2s %d %d %d%c",&magic,&width,&height,&maxVal,&dummy)

Is that guaranteed to work according to the specification?

答案1

得分: 1

不,我不认为那是安全的。虽然它现在可以工作,但文档说明该函数保留了读取超过一个字符的值的权利,除非你有一个UnreadRune()方法。

通过将你的读取器包装在bufio.Reader中,你可以确保读取器具有一个UnreadRune()方法。然后你需要自己读取最后的空白字符。

buf := bufio.NewReader(input)
fmt.Fscanf(buf,"%2s %d %d %d",&magic,&width,&height,&maxVal)
buf.ReadRune() // 从缓冲区中移除下一个字符(空白字符)。

<br>
编辑:

正如我们在聊天中讨论的那样,你可以假设虚拟字符方法有效,并编写一个测试,以便在它停止工作时得知。测试可以是这样的:

func TestFmtBehavior(t *testing.T) {
    // 使用multireader来防止r实现io.RuneScanner
    r := io.MultiReader(bytes.NewReader([]byte("data  ")))

    n, err := fmt.Fscanf(r, "%s%c", new(string), new(byte))
    if n != 2 || err != nil {
        t.Error("failed scan", n, err)
    }

    // 虚拟字符读取了比"data"多一个字符。
    // 仍然应该有一个字节剩下
    if n, err := r.Read(make([]byte, 5)); n != 1 {
        t.Error("assertion failed", n, err)
    }
}
英文:

No, I would not consider that safe. While it works now, the documentation states that the function reserves the right to read past the value by one character unless you have an UnreadRune() method.

By wrapping your reader in a bufio.Reader, you can ensure the reader has an UnreadRune() method. You will then need to read the final whitespace yourself.

buf := bufio.NewReader(input)
fmt.Fscanf(buf,&quot;%2s %d %d %d&quot;,&amp;magic,&amp;width,&amp;height,&amp;maxVal)
buf.ReadRune() // remove next rune (the whitespace) from the buffer.

<br>
Edit:

As we discussed in the chat, you can assume the dummy char method works and then write a test so you know when it stops working. The test can be something like:

func TestFmtBehavior(t *testing.T) {
    // use multireader to prevent r from implementing io.RuneScanner
	r := io.MultiReader(bytes.NewReader([]byte(&quot;data  &quot;)))

	n, err := fmt.Fscanf(r, &quot;%s%c&quot;, new(string), new(byte))
	if n != 2 || err != nil {
		t.Error(&quot;failed scan&quot;, n, err)
	}

    // the dummy char read 1 extra char past &quot;data&quot;.
    // one byte should still remain
	if n, err := r.Read(make([]byte, 5)); n != 1 {
		t.Error(&quot;assertion failed&quot;, n, err)
	}
}

huangapple
  • 本文由 发表于 2013年4月6日 02:44:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/15841257.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定