在Go语言中的浮点数运算

huangapple go评论72阅读模式
英文:

floating point operations in go

问题

以下是Go语言示例代码的翻译:

package main

import "fmt"

func mult32(a, b float32) float32 { return a*b }
func mult64(a, b float64) float64 { return a*b }


func main() {
    fmt.Println(3*4.3)					// A1, 12.9
    fmt.Println(mult32(3, 4.3))			// B1, 12.900001
    fmt.Println(mult64(3, 4.3))			// C1, 12.899999999999999
    
    fmt.Println(12.9 - 3*4.3)			// A2, 1.8033161362862765e-130
    fmt.Println(12.9 - mult32(3, 4.3))	// B2, -9.536743e-07
    fmt.Println(12.9 - mult64(3, 4.3))	// C2, 1.7763568394002505e-15
    
    fmt.Println(12.9 - 3*4.3)								// A4, 1.8033161362862765e-130
    fmt.Println(float32(12.9) - float32(3)*float32(4.3))	// B4, -9.536743e-07
    fmt.Println(float64(12.9) - float64(3)*float64(4.3))	// C4, 1.7763568394002505e-15
    
}


Results differences between lines A1, B1 and C1 are understandable. However, starting from A2 to C2 magic comes. Result from neither of B2 nor C2 matches the result from A2 line. The same is true for lines x2 (x = A, B or C) - but the outputs of x2 and x4 are the same.

Just to be sure let's print the results in the binary form.



    fmt.Printf("%b\n", 3*4.3)					// A11, 7262054399134925p-49
    fmt.Printf("%b\n", mult32(3, 4.3))			// B11, 13526631p-20
    fmt.Printf("%b\n", mult64(3, 4.3))			// C11, 7262054399134924p-49
    
    fmt.Printf("%b\n", 12.9 - 3*4.3)			// A12, 4503599627370496p-483
    fmt.Printf("%b\n", 12.9 - mult32(3, 4.3))	// B12, -8388608p-43
    fmt.Printf("%b\n", 12.9 - mult64(3, 4.3))	// C12, 4503599627370496p-101
    
    fmt.Printf("%b\n", 12.9 - 3*4.3)								// A14, 4503599627370496p-483
    fmt.Printf("%b\n", float32(12.9) - float32(3)*float32(4.3))		// B14, -8388608p-43
    fmt.Printf("%b\n", float64(12.9) - float64(3)*float64(4.3))		// C14, 4503599627370496p-101


Some facts from the code above (one in the bin form):

1. There is difference between line A11 and C11 (last digit - just before the exponent).
2. Lines A12 and C12 are almost the same (except the exponent!!!), the same can be observed between line A14 and C14.

And here the questions come:

1. How computations of bare (naked :)) numbers are performed? (computations in every Axx line)
2. Are they performed by compiler/whatever?
3. If yes, then why are they different? Optimization?
4. Are they computed in some system which differs from IEE-754?
5. If yes, why so?
6. Is achieving more accurate precision justified such approach?

Code has been tested on 64-bit Linux under both "go run" and "go build" (go1.0.3), and also on that site: http://tour.golang.org/

请注意,这只是代码的翻译部分,不包括问题部分。

英文:

Here's the sample code in go:

package main
import "fmt"
func mult32(a, b float32) float32 { return a*b }
func mult64(a, b float64) float64 { return a*b }
func main() {
fmt.Println(3*4.3)					// A1, 12.9
fmt.Println(mult32(3, 4.3))			// B1, 12.900001
fmt.Println(mult64(3, 4.3))			// C1, 12.899999999999999
fmt.Println(12.9 - 3*4.3)			// A2, 1.8033161362862765e-130
fmt.Println(12.9 - mult32(3, 4.3))	// B2, -9.536743e-07
fmt.Println(12.9 - mult64(3, 4.3))	// C2, 1.7763568394002505e-15
fmt.Println(12.9 - 3*4.3)								// A4, 1.8033161362862765e-130
fmt.Println(float32(12.9) - float32(3)*float32(4.3))	// B4, -9.536743e-07
fmt.Println(float64(12.9) - float64(3)*float64(4.3))	// C4, 1.7763568394002505e-15
}

Results differences between lines A1, B1 and C1 are understandable. However, starting from A2 to C2 magic comes. Result from neither of B2 nor C2 matches the result from A2 line. The same is true for lines x2 (x = A, B or C) - but the outputs of x2 and x4 are the same.

Just to be sure let's print the results in the binary form.

    fmt.Printf("%b\n", 3*4.3)					// A11, 7262054399134925p-49
fmt.Printf("%b\n", mult32(3, 4.3))			// B11, 13526631p-20
fmt.Printf("%b\n", mult64(3, 4.3))			// C11, 7262054399134924p-49
fmt.Printf("%b\n", 12.9 - 3*4.3)			// A12, 4503599627370496p-483
fmt.Printf("%b\n", 12.9 - mult32(3, 4.3))	// B12, -8388608p-43
fmt.Printf("%b\n", 12.9 - mult64(3, 4.3))	// C12, 4503599627370496p-101
fmt.Printf("%b\n", 12.9 - 3*4.3)								// A14, 4503599627370496p-483
fmt.Printf("%b\n", float32(12.9) - float32(3)*float32(4.3))		// B14, -8388608p-43
fmt.Printf("%b\n", float64(12.9) - float64(3)*float64(4.3))		// C14, 4503599627370496p-101

Some facts from the code above (one in the bin form):

  1. There is difference between line A11 and C11 (last digit - just before the exponent).
  2. Lines A12 and C12 are almost the same (except the exponen!!!), the same can be observed between line A14 and C14.

And here the questions come:

  1. How computations of bare (naked :)) numbers are performed? (computations in every Axx line)
  2. Are they performed by compiler/whatever?
  3. If yes, then why are they different? Optimisation?
  4. Are they computed in some system which differs from IEE-754?
  5. If yes, why so?
  6. Is achieving more accurate precision justifiest such approach?

Code has been tested on 64bit linux under both "go run" and "go build" (go1.0.3), and also on that site: http://tour.golang.org/

答案1

得分: 5

  1. 常量

    • 数字常量表示任意精度的值,不会溢出。
    • 用至少256位表示整数常量。
    • 用至少256位的尾数和至少32位的有符号指数表示浮点常量,包括复数常量的部分。
  2. 是的,编译器会用于编译时常量。

  3. 是的,它们是不同的:涉及更高的精度。参见1。

  4. 是的,参见1。

  5. 为了减少多项浮点常量表达式的累积误差。

  6. 当然可以。降低精度难道会是一个目标吗?运行时浮点运算本身就是不完美的,不需要从常量表达式中增加更多的不精确性。

英文:
  1. Constants:
    > - Numeric constants represent values of arbitrary precision and do not overflow.
    > - Represent integer constants with at least 256 bits.
    > - Represent floating-point constants, including the parts of a complex constant, with a mantissa of at least 256 bits and a signed exponent of at least 32 bits.

  2. Yes, by the compiler for compile time constants.

  3. Yes, they're different: More precision is involved. See 1.

  4. Yes, see 1.

  5. To minimize accumulation of floating point errors for multi-term floating point constant expressions.

  6. Of course yes. Can achieving lower precision be ever a goal? It's enough that run-time floating point operations are intrinsically imperfect, no need to add more imprecision from constant expressions.

答案2

得分: 0

浮点常量的表示,包括复数常量的部分,需要至少256位的尾数和至少32位的带符号指数。

请注意,Go 1.8(目前在2016年第四季度测试版中,于2017年第一季度发布)改变了这个定义:

语言规范现在只要求实现支持最多16位的指数的浮点常量。
这不会影响“gc”或gccgo编译器,它们仍然支持32位的指数。

这来自于变更 17711

spec: 要求常量中的最小指数为16位而不是32位

16位的二进制指数允许常量范围大致覆盖从7e-9865到7e9863的范围,这对于任何实际和假设的常量运算来说已经足够了。

此外,直到最近,cmd/compile无法正确处理非常大的指数;也就是说,任何真实的程序(但对于探索边界情况的测试除外)受到影响的机会几乎为零。

最后,限制最小支持范围显著降低了在一个在现实中几乎无关紧要的领域的实现复杂性,这个领域是针对不依赖于支持32位指数范围的现有任意精度算术包的新的或替代的规范兼容实现的。

这在技术上是一项语言变更,但出于上述原因,这不太可能影响任何真实的程序,当然也不会影响使用gc或gccgo编译器编译的程序,因为它们目前支持最多32位的指数。

参见提到的问题 13572

在Go 1.4中,编译器拒绝了大于10000的指数(因为知道代码对于更大的指数不起作用),而用户没有任何抱怨。

在Go的早期版本中,大指数被默默地处理错误,同样没有任何用户的抱怨。

英文:

> Represent floating-point constants, including the parts of a complex constant, with a mantissa of at least 256 bits and a signed exponent of at least 32 bits.

Note that Go 1.8 (currently in beta in Q4 2016, released in Q1 2017) changes that definition:

> The language specification now only requires that implementations support up to 16-bit exponents in floating-point constants.
This does not affect either the “gc” or gccgo compilers, both of which still support 32-bit exponents.

That comes from change 17711

> ## spec: require 16 bit minimum exponent in constants rather than 32
>
> A 16bit binary exponent permits a constant range covering roughly the range from 7e-9865 to 7e9863 which is more than enough for any practical and hypothetical constant arithmetic.
>
> Furthermore, until recently cmd/compile could not handle very large exponents correctly anyway; i.e., the chance that any real programs (but for tests that explore corner cases) are affected are close to zero.
>
> Finally, restricting the minimum supported range significantly reduces the implementation complexity in an area that hardly matters in reality for new or alternative spec-compliant implementations that don't or cannot rely on pre-existing arbitratry precision arithmetic packages that support a 32bit exponent range.
>
> This is technically a language change but for the reasons mentioned above this is unlikely to affect any real programs, and certainly not programs compiled with the gc or gccgo compilers as they currently support up to 32bit exponents.

See issue 13572 mentioning that:

> In Go 1.4 the compiler rejected exponents larger than 10000 (due to knowing the code didn't work for larger exponents) without any complaint from users.
>
> In earlier versions of Go, large exponents were silently mishandled, again without any complaint from users.

huangapple
  • 本文由 发表于 2013年8月5日 19:13:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/18056787.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定