英文:
How does an untyped constant '\n' get converted into a byte when passed as method arg?
问题
我正在观看FOSDEM '17上关于在Go中实现"tail -f"的演讲=> https://youtu.be/lLDWF59aZAo
在作者的初始示例程序中,他使用文件句柄创建了一个Reader
,然后使用带有分隔符'\n'的ReadString
方法逐行读取文件并打印其内容。我通常使用Scanner
,所以这对我来说是新的。
以下是程序 | Go Playground链接
package main
import (
"bufio"
"fmt"
"log"
"os"
)
func main() {
fileHandle, err := os.Open("someFile.log")
if err != nil {
log.Fatalln(err)
return
}
defer fileHandle.Close()
reader := bufio.NewReader(fileHandle)
for {
line, err := reader.ReadString('\n')
if err != nil {
log.Fatalln(err)
break
}
fmt.Print(line)
}
}
现在,ReadString
方法接受一个字节作为其分隔符参数[https://golang.org/pkg/bufio/#Reader.ReadString]
所以我的问题是,'\n'
,它是一个rune
,是如何转换为byte
的?我无法理解这一点。特别是因为byte
是uint8
的别名,而rune
是int32
的别名。
我在Gophers Slack上问了同样的问题,有人告诉我'\n'
不是一个rune
,而是一个无类型常量。如果我们实际上使用rune
并将其传递进去,编译将会失败。这让我更加困惑了一些。
我还收到了一个关于Go规范中类型标识的部分的链接=> https://golang.org/ref/spec#Type_identity
如果这个程序如果是一个实际的rune
,它是不应该编译的,为什么编译器允许一个无类型常量通过呢?这不是不安全的行为吗?
我猜想这是因为Go规范中的可赋值性部分的规则,它说
> x是一个可由类型T的值表示的无类型常量。
由于'\n'
确实可以赋值给类型为byte
的变量,所以它被转换了。
我的推理正确吗?
英文:
I was watching this talk given at FOSDEM '17 about implementing "tail -f" in Go => https://youtu.be/lLDWF59aZAo
In the author's initial example program, he creates a Reader
using a file handle, and then uses the ReadString
method with delimiter '\n' to read the file line by line and print its contents. I usually use Scanner
, so this was new to me.
Program below | Go Playground Link
package main
import (
"bufio"
"fmt"
"log"
"os"
)
func main() {
fileHandle, err := os.Open("someFile.log")
if err != nil {
log.Fatalln(err)
return
}
defer fileHandle.Close()
reader := bufio.NewReader(fileHandle)
for {
line, err := reader.ReadString('\n')
if err != nil {
log.Fatalln(err)
break
}
fmt.Print(line)
}
}
Now, ReadString
takes a byte as its delimiter argument[https://golang.org/pkg/bufio/#Reader.ReadString]
So my question is, how in the world did '\n', which is a rune
, get converted into a byte
? I am not able to get my head around this. Especially since byte
is an alias for uint8
, and rune
is an alias for int32
.
I asked the same question in Gophers slack, and was told that '\n' is not a rune
, but an untyped constant. If we actually created a rune
using '\n' and passed it in, the compilation would fail. This actually confused me a bit more.
I was also given a link to a section of the Go spec regarding Type Identity => https://golang.org/ref/spec#Type_identity
If the program is not supposed to compile if it were an actual rune
, why does the compiler allow an untyped constant to go through? Isn't this unsafe behaviour?
My guess is that this works due to a rule in the Assignability section in the Go spec, which says
> x is an untyped constant representable by a value of type T.
Since '\n' can indeed be assigned to a variable of type byte
, it is therefore converted.
Is my reasoning correct?
答案1
得分: 5
TL;DR 是的,你是正确的,但还有一些更多的内容。
'\n'
是一个未指定类型的 rune 常量。它没有具体的类型,但有一个默认类型,即 int32
(rune 是 int32
的别名)。它表示字面值 "\n",其数值为 10
:
package main
import (
"fmt"
)
func main() {
fmt.Printf("%T %v %c\n", '\n', '\n', '\n') // int32 10 (newline)
}
回答你的问题的规范部分位于第§ Calls(我强调的部分):
给定一个函数类型 F 的表达式 f,
f(a1, a2, … an)
调用 f 并传入参数 a1, a2, … an。除了一个特殊情况外,参数必须是单值表达式,并且可赋值给 F 的参数类型,并且在调用函数之前进行求值。
"可赋值" 是关键术语,你引用的规范部分解释了它的含义。正如你猜测的那样,在各种可赋值规则中,适用于此处的规则是:
x 是一个可由类型 T 的值表示的未指定常量。
在我们的例子中,这可以解释为:
'\n'
是一个未指定的(rune)常量,可以由类型byte
的值表示。
当调用 ReadString()
时,'\n'
实际上被转换为一个字节,这一事实更加明显。如果我们尝试将一个宽度大于 1 字节的未指定 rune 常量传递给一个期望 byte
类型的函数,就会出现问题:
package main
func main() {
foo('α')
}
func foo(b byte) {}
上面的代码会失败,并显示以下错误:
tmp/sandbox120896917/main.go:9: constant 945 overflows byte
这是因为 'α'
实际上占用了 2 个字节,因此无法转换为 byte
类型的值(byte
类型的最大整数是 255,而 'α'
实际上是 945)。
所有这些内容都在官方博客文章 Constants 中有详细解释。
英文:
TL;DR Yes you are correct but there's something more.
'\n'
is an untyped rune constant. It doesn't have a type but a default type which is int32
(rune
is an alias for int32
). It holds a single byte representing the literal "\n", which is the numeric value 10
:
package main
import (
"fmt"
)
func main() {
fmt.Printf("%T %v %c\n", '\n', '\n', '\n') // int32 10 (newline)
}
https://play.golang.org/p/lMjrTFDZUM
The part of the spec that answers your question lies in the § Calls (emphasis mine):
> Given an expression f of function type F,
>
> f(a1, a2, … an)
>
> calls f with arguments a1, a2, … an. Except for one
> special case, arguments must be single-valued expressions assignable
> to the parameter types of F and are evaluated before the function is
> called.
"assignable" is the key term here and the part of the spec you quoted explains what it means. As you correctly guessed, among the various rules of assignability, the one that applies here is the following:
> x is an untyped constant representable by a value of type T.
In our case this translates to:
> '\n' is an untyped (rune) constant representable by a value of type byte
The fact that '\n'
is actually converted to a byte when calling ReadString()
is more apparent if we try passing an untyped rune constant wider than 1 byte, to a function that expects a byte
:
package main
func main() {
foo('α')
}
func foo(b byte) {}
https://play.golang.org/p/W0EUZppWHH
The code above fails with:
> tmp/sandbox120896917/main.go:9: constant 945 overflows byte
That's because 'α'
is actually 2 bytes, which means it cannot be converted to a value of type byte
(the maximum integer a byte
can hold is 255 while 'α'
is actually 945).
All this is explained in the official blog post, Constants.
答案2
得分: 2
是的,你的理解是正确的。Spec: Assignability 部分适用于这里,因为你想要传递的值必须可以赋值给参数的类型。
当你传递值 '\n'
时,这是一个由 rune 字面量 指定的未类型化的 常量。它表示一个等于 '\n'
字符的 Unicode 编码的数字(顺便说一下,它是 10)。你引用的规则在这里适用:
x
是一个可以用类型T
的值表示的未类型化的 常量。
常量有一个默认类型,在使用该值的上下文中缺少类型时将使用该默认类型。一个例子是 短变量声明:
r := '\n'
fmt.Printf("%T", r)
rune 字面量的默认类型就是 rune
。上面的代码打印出 int32
,因为 rune
类型是 int32
的别名(它们是“相同的”,可以互换使用)。在 Go Playground 上试一试。
现在,如果你尝试将变量 r
传递给一个期望 byte
类型的函数,这将是一个编译时错误,因为这种情况不符合任何可赋值性规则。你需要进行显式的类型 转换 来使这种情况工作:
r := '\n'
line, err := reader.ReadString(byte(r))
相关的博客文章和问题:
英文:
Yes, your reading is correct. Spec: Assignability section applies here as the value you want to pass must be assignable to the type of the parameter.
When you pass the value '\n'
, that is an untyped constant specified by a rune literal. It represents a number equal to the Unicode code of the '\n'
character (which is 10 by the way). The rule you quoted applies here:
> x
is an untyped constant representable by a value of type T
.
Constants have a default type, which will be used when a type is "missing" from the context where the value is used. Such an example is the short variable declaration:
r := '\n'
fmt.Printf("%T", r)
The default type of a rune literal is that: rune
. The above code prints int32
because the rune
type is an alias for int32
(they are "identical", interchangable). Try it on the Go Playground.
Now if you try to pass the variable r
to a function which expects a value of type byte
, it is a compile time error, because this case matches none of the assignability rules. You need explicit type conversion to make such a case work:
r := '\n'
line, err := reader.ReadString(byte(r))
See related blog posts and questions:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论