对浮点数舍入的Go语言规范的误解

huangapple go评论89阅读模式
英文:

Misunderstanding Go Language specification on floating-point rounding

问题

根据Go语言规范中的常量表达式部分,以下是翻译的内容:

> 编译器在计算无类型浮点数或复数常量表达式时可能会使用舍入;请参阅常量部分中的实现限制。这种舍入可能会导致浮点常量表达式在整数上下文中无效,即使在使用无限精度计算时它是整数,反之亦然。


句子

> 这种舍入可能会导致浮点常量表达式在整数上下文中无效

是否指向类似以下的情况:

func main() {
	a := 853784574674.23846278367
	fmt.Println(int8(a)) // 输出: 0
}
英文:

The Go language specification on the section about Constant expressions states:

> A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa.


Does the sentence

> This rounding may cause a floating-point constant expression to be invalid in an integer context

point to something like the following:

func main() {
	a := 853784574674.23846278367
	fmt.Println(int8(a)) // output: 0
}

答案1

得分: 2

从规范中引用的部分不适用于你的示例,因为a不是一个常量表达式,而是一个变量,所以int8(a)是将一个非常量表达式转换为整数。这种转换在规范:转换中有所涵盖,数值类型之间的转换:

当将浮点数转换为整数时,小数部分被丢弃(向零截断)。

[...] 在涉及浮点数或复数值的所有非常量转换中,如果结果类型无法表示该值,则转换成功,但结果值是实现相关的。

由于你将一个非常量表达式a(值为853784574674.23846278367)转换为整数,小数部分被丢弃,由于结果无法适应int8,所以结果是未指定的,它是实现相关的。

引用的部分意味着,尽管常量在语言中以比内置类型(例如float64int64)更高的精度表示,但编译器(必须)实现的精度并不是无限的(出于实际原因),即使浮点数字面量可以精确表示,对它们执行的操作也可能进行中间舍入,并且可能不会给出数学上正确的结果。

规范包括最低可支持的精度:

实现限制:尽管数字常量在语言中具有任意精度,但编译器可以使用具有有限精度的内部表示来实现它们。也就是说,每个实现必须:

  • 用至少256位表示整数常量。
  • 用至少256位的尾数和至少16位的有符号二进制指数表示浮点常量,包括复数常量的各个部分。
  • 如果无法精确表示整数常量,则给出错误。
  • 如果由于溢出而无法表示浮点或复数常量,则给出错误。
  • 如果由于精度限制而无法表示浮点或复数常量,则四舍五入到最接近的可表示常量。

例如:

const (
    x = 1e100000 + 1
    y = 1e100000
)

func main() {
    fmt.Println(x - y)
}

这段代码应该输出1,因为xy大1。在Go Playground上运行它输出0,因为常量表达式x - y是通过舍入执行的,结果是丢失了+1xy都是整数(没有小数部分),所以在整数上下文中,结果应该是1。但是表示数字1e100000需要约333000位,这不是编译器的有效要求(根据规范,256位尾数就足够了)。

如果我们降低常量的值,我们会得到正确的结果:

const (
    x = 1e1000 + 1
    y = 1e1000
)

func main() {
    fmt.Println(x - y)
}

这将输出数学上正确的结果1。在Go Playground上试一试。表示数字1e1000需要约3333位,这似乎是受支持的(远远超过最低的256位要求)。

英文:

The quoted part from the spec does not apply to your example, as a is not a constant expression but a variable, so int8(a) is converting a non-constant expression. This conversion is covered by Spec: Conversions, Conversions between numeric types:

> When converting a floating-point number to an integer, the fraction is discarded (truncation towards zero).
>
> [...] In all non-constant conversions involving floating-point or complex values, if the result type cannot represent the value the conversion succeeds but the result value is implementation-dependent.

Since you convert a non-constant expression a being 853784574674.23846278367 to an integer, the fraction part is discarded, and since the result does not fit into an int8, the result is not specified, it's implementation-dependent.

The quoted part means that while constants are represented with a lot higher precision than the builtin types (eg. float64 or int64), the precision that a compiler (have to) implement is not infinite (for practical reasons), and even if a floating point literal is representable precisely, performing operations on them may be carried out with intermediate roundings and may not give mathematically correct result.

The spec includes the minimum supportable precision:

> Implementation restriction: Although numeric constants have arbitrary precision in the language, a compiler may implement them using an internal representation with limited precision. That said, every implementation must:
>
> - Represent integer constants with at least 256 bits.
> - Represent floating-point constants, including the parts of a complex constant, with a mantissa of at least 256 bits and a signed binary exponent of at least 16 bits.
> - Give an error if unable to represent an integer constant precisely.
> - Give an error if unable to represent a floating-point or complex constant due to overflow.
> - Round to the nearest representable constant if unable to represent a floating-point or complex constant due to limits on precision.

For example:

const (
    x = 1e100000 + 1
    y = 1e100000
)

func main() {
    fmt.Println(x - y)
}

This code should output 1 as x is being 1 larger than y. Running it on the Go Playground outputs 0 because the constant expression x - y is executed with roundings, and the +1 is lost as a result. Both x and y are integers (have no fraction part), so in integer context the result should be 1. But the number being 1e100000, representing it requires around ~333000 bits, which is not a valid requirement from a compiler (according to the spec, 256 bit mantissa is sufficient).

If we lower the constants, we get correct result:

const (
    x = 1e1000 + 1
    y = 1e1000
)

func main() {
    fmt.Println(x - y)
}

This outputs the mathematically correct 1 result. Try it on the Go Playground. Representing the number 1e1000 requires around ~3333 bits which seems to be supported (and it's way above the minimum 256 bit requirement).

答案2

得分: 0

int8是有符号整数,其取值范围为-128到127。这就是为什么使用int8(a)转换时会出现意外值的原因。

英文:

An int8 is a signed integer, and can have a value from -128 to 127. That's why you are seeing unexpected value with int8(a) conversion.

huangapple
  • 本文由 发表于 2022年4月16日 23:25:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/71895156.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定