2014年3月16日 22:50:45go评论110阅读模式

英文:

Go scanner - correctness for whitespace?

问题

Go语言中的text/scanner/scanner.go包中的Go扫描器使用了一个技巧来查找空白字符：

const GoWhitespace = 1&lt;&lt;'\\t' | 1&lt;&lt;'\\n' | 1&lt;&lt;'\\r' | 1&lt;&lt;' '

然后：

// 跳过空白字符
for s.Whitespace&(1&lt;&lt;uint(ch)) != 0 {
    ch = s.next()
}

由于字符值左移超过31位，是否可能存在不唯一的情况？我的意思是，当某个字符与32取模后与制表符相同时，它会被识别为空白字符吗？

英文:

The Go scanner package in text/scanner/scanner.go uses trick to find whitespace:

const GoWhitespace = 1&lt;&lt;&#39;\t&#39; | 1&lt;&lt;&#39;\n&#39; | 1&lt;&lt;&#39;\r&#39; | 1&lt;&lt;&#39; &#39;

And then:

// skip white space
for s.Whitespace&amp;(1&lt;&lt;uint(ch)) != 0 {
	ch = s.next()
}

Since character values shift left by more than 31, can there be cases where this is not unique? I mean, when some char is the same as tab modulo 32, it will be recognized as whitespace?

答案1

得分: 2

如规范中所述，<< 是一个位移操作符：

位移操作符将左操作数按照右操作数指定的位数进行位移。如果左操作数是有符号整数，则实现算术位移；如果是无符号整数，则实现逻辑位移。位移计数没有上限。位移的行为就好像左操作数按照位移计数 n 进行 n 次位移。因此，x << 1 等同于 x*2，x >> 1 等同于 x/2，但结果向负无穷截断。

对于较大的 ch 值，1<<uint(ch) 会导致溢出：

对于无符号整数值，操作 +、-、* 和 << 是模 2^n 运算，其中 n 是无符号整数类型的位宽。粗略地说，这些无符号整数操作在溢出时会丢弃高位，程序可以依赖于“环绕”。

对于有符号整数，操作 +、-、* 和 << 可以合法地溢出，结果值存在并由有符号整数表示、操作和操作数确定。溢出不会引发异常。编译器不能在假设不会发生溢出的情况下优化代码。例如，它不能假设 x < x + 1 总是为真。

因此，使用位旋转操作符（你所描述的）来实现 << 将违反规范。对于大于 int 类型大小的 ch 值，1<<uint(ch) 的结果将为零，因此不会导致任何错误判断。

英文:

As mentioned in the spec, the << is a shift operation:

> The shift operators shift the left operand by the shift count specified by the right operand. They implement arithmetic shifts if the left operand is a signed integer and logical shifts if it is an unsigned integer. There is no upper limit on the shift count. Shifts behave as if the left operand is shifted n times by 1 for a shift count of n. As a result, x << 1 is the same as x*2 and x >> 1 is the same as x/2 but truncated towards negative infinity.

For large values of ch, 1<<uint(ch) will cause an overflow:

> For unsigned integer values, the operations +, -, *, and << are computed modulo 2<sup>n</sup>, where n is the bit width of the unsigned integer's type. Loosely speaking, these unsigned integer operations discard high bits upon overflow, and programs may rely on ``wrap around''.
>
> For signed integers, the operations +, -, *, and << may legally overflow and the resulting value exists and is deterministically defined by the signed integer representation, the operation, and its operands. No exception is raised as a result of overflow. A compiler may not optimize code under the assumption that overflow does not occur. For instance, it may not assume that x < x + 1 is always true.

So implementing << with a bitwise rotation operator (what you seem to be describing) would violate the spec. 1<<uint(ch) will evaluate to zero for values of ch larger than the size of the int type, so won't cause any false positives.

答案2

得分: 0

规范明确指出，对于无符号操作，高位被屏蔽，因此低位实际上是“环绕”的。

它能够工作的原因是：

Scanner.Whitespace 实际上是 uint64，所以 GoWhitespace 的值完全适合。
在运行时，对无符号整数进行的操作 s.Whitespace&(1<<uint(ch)) 可能会产生任意大的中间值，并且会环绕。因此，如果字符是“a”（96），我们有 1 < < 96，它会溢出，所以模 64 位整数的大小就是 0。

英文:

Fully answer:

Spec explicitly say that for operations on unsigned, we get high bits masked out so the low bits are really "wrap around".

The reason it works is:

Scanner.Whitespace is actually uint64 so value of GoWhitespace fully fit
Operation s.Whitespace&(1<<uint(ch)) on unsigned integer at runtime can have intermediate values arbitrary large and will wrap around. So if say char is "a" (96) we have 1 << 96 which overflows so modulo size of 64-bit int it's 0.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Go扫描器 – 空白符的正确性？

问题

答案1

答案2

Golang net/http上传大文件时出现未定义错误。

为什么Cobra没有读取我的配置文件？

Golang AES CFB – 变异 IV

嵌套字典 Golang 的问题

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。