英文:
What is a perfomance penalty for single token lookahead?
问题
当比较Go和Scala的语句结束检测时,我发现Scala的规则更丰富,具体如下:
除非满足以下条件之一,否则将行尾视为分号:
- 该行以不能作为语句结束的单词结尾,例如句号或中缀运算符。
- 下一行以不能作为语句开头的单词开始。
- 该行在括号(...)或方括号[...]内结束,因为这些不能包含多个语句。
规则1也适用于Go。规则3也是如此。唯一的区别在于规则2,它涉及到单个向前查看,因为涉及到一个标记("word")。
这会导致什么样的性能损失:1%、5%、还是10%?
我想知道为什么Go的设计者没有采用那个规则(不是问问题)——如果不是出于性能考虑,这会使语言更可靠,例如在方法链中:
x = some_object.select(...)
.sort(...)
.reverse(...)
.where(...)
.single()
如果我没记错的话,对于Go来说,这是一个错误(你可以通过两种可能的方式解决——用大括号括起整个语句或用括号括起表达式,但这需要手动调整),而Scala会正确处理。
英文:
When comparing Go and Scala end of statement detection I found out that the rules for Scala are richer, namely:
> A line ending is treated as a semicolon unless one of the following
> conditions is true:
>
> * The line in question ends in a word that would not be legal as the end of a statement, such as a period or an infix operator.
> * The next line begins with a word that cannot start a statement.
> * The line ends while inside parentheses (...) or brackets [...], because these cannot contain multiple statements anyway.
Quoted from Scala - The rules of semicolon inference.
Rule #1 is how the Go works as well. Rule #3 too. The only difference is rule #2 – it involves single lookahead, since there is one token involved ("word").
What kind of performance penalty is involved: 1% slower, 5%, 10%?
I would love to see a comment (not the question) why Go designers left out that rule – if not for performance, it makes language more reliable, for example in method chaining:
x = some_object.select(...)
.sort(...)
.reverse(...)
.where(...)
.single()
If I am not mistaken for Go it is an error (you can solve it in two possible ways – taking entire statement in braces or expression in parentheses, but it is manual tweaking), Scala will take it as it should.
答案1
得分: 7
性能惩罚与编译器的其他任务相比,完全可以忽略不计。Scala内部邮件列表中有以下Haoyi Li和Martin Odersky之间关于Haoyi为Scala编写的parboiled2解析器的交流:
Haoyi Li:就性能而言,它可以在15秒内解析scala/scala、lift、scalaz、scalajs、playframework和shapeless中的所有内容...有人知道编译器和宏中有多少时间花在解析上吗?我的印象是绝大部分时间都花在类型检查器上。
Odersky:是的,与编译器的其他任务相比,解析相当微不足道...话虽如此,[下一代Scala编译器的解析器](手写的,包括错误报告、准确位置和树构建在内,共2100行)每秒可以解析数十万行。所以parboiled还有一段路要走才能超越它
当我们谈论每秒解析数十万行代码(包括规则#2)时,可以推断出速度不是问题。Go编译通常以每秒约20k行的速度进行,所以即使Go解析不花费任何时间,而Scala解析的整个时间都用于单行向前查看,也只占构建过程的不到10%的惩罚。
实际上,应该接近于0%。向前查看通常非常廉价;你已经有了一个标记流,所以只需查看下一个标记即可。
英文:
The performance penalty is utterly negligible compared to everything else that the compiler has to do. The Scala-internals mailing list has the following exchange between Haoyi Li and Martin Odersky regarding a parboiled2 parser Haoyi wrote for Scala:
> Haoyi Li: In terms of perf[ormance], it can parse everything in scala/scala, lift, scalaz, scalajs, playframework and shapeless in 15 seconds.... Does anyone know how much of the time in the compiler and macros is spent parsing? My impression is that the vast vast vast majority of the time is spent in the typechecker.
> Odersky: Yes, parsing is pretty insignificant compared to the other tasks of a compiler ... That said, the [parser for the next-generation Scala compiler] (hand-written, 2100 lines including error reporting, accurate positions and tree construction) achieves several hundred thousand lines a second. So parboiled still has some way to go to beat that
When we're talking about hundreds of thousands of lines of code parsed per second including rule #2, one can infer that speed is not the issue. Go compilation tends to clock in at around 20k lines per second, so even if Go parsing took zero time, and the entire time for Scala parsing was taken up by the one-line lookahead, it would be less than 10% of a penalty to the build process.
In reality it should be more like 0%. Lookahead is usually really cheap; you've already got a token stream, so you just look at the next one.
答案2
得分: 1
似乎如果一行代码不以语句开头,编译器会报错。此外,你可以在Go语言中链式调用方法。这里是一个示例链接。
英文:
Seems if line begins with anything but statement compiler would complain. Also you can chain methods in Go https://play.golang.org/p/h8NYnBXjFI
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论