Golang浮点数精度float32与float64

huangapple go评论90阅读模式
英文:

Golang floating point precision float32 vs float64

问题

我写了一个用Go语言演示浮点数误差的程序:

func main() {
    a := float64(0.2)
    a += 0.1
    a -= 0.3
    var i int
    for i = 0; a < 1.0; i++ {
        a += a
    }
    fmt.Printf("After %d iterations, a = %e\n", i, a)
}

它的输出结果是:

After 54 iterations, a = 1.000000e+00

这与使用double类型编写的C程序的行为相匹配。

然而,如果使用float32,该程序会陷入无限循环!如果你修改C程序,将double替换为float,它会输出:

After 27 iterations, a = 1.600000e+00

为什么当使用float32时,Go程序的输出结果与C程序不同呢?

英文:

I wrote a program to demonstrate floating point error in Go:

func main() {
	a := float64(0.2) 
	a += 0.1
	a -= 0.3
	var i int
	for i = 0; a &lt; 1.0; i++ {
		a += a
	}
	fmt.Printf(&quot;After %d iterations, a = %e\n&quot;, i, a)
}

It prints:

After 54 iterations, a = 1.000000e+00

This matches the behaviour of the same program written in C (using the double type)

However, if float32 is used instead, the program gets stuck in an infinite loop! If you modify the C program to use a float instead of a double, it prints

After 27 iterations, a = 1.600000e+00

Why doesn't the Go program have the same output as the C program when using float32?

答案1

得分: 33

使用math.Float32bitsmath.Float64bits,你可以看到Go如何将不同的十进制值表示为IEEE 754二进制值:

Playground: https://play.golang.org/p/ZqzdCZLfvC

结果:

float32(0.1): 00111101110011001100110011001101
float32(0.2): 00111110010011001100110011001101
float32(0.3): 00111110100110011001100110011010
float64(0.1): 0011111110111001100110011001100110011001100110011001100110011010
float64(0.2): 0011111111001001100110011001100110011001100110011001100110011010
float64(0.3): 0011111111010011001100110011001100110011001100110011001100110011

如果你将这些二进制表示转换为十进制值并进行循环,你会发现对于float32,a的初始值将是:

0.20000000298023224
+ 0.10000000149011612
- 0.30000001192092896
= -7.4505806e-9

这是一个永远无法累加到1的负值。

那么,为什么C的行为不同呢?

如果你观察二进制模式(并稍微了解如何表示二进制值),你会发现Go会四舍五入最后一位,而我猜C只是简单地截断它。

因此,在某种意义上,虽然Go和C都不能精确表示0.1,但Go使用最接近0.1的值:

Go:   00111101110011001100110011001101 => 0.10000000149011612
C(?): 00111101110011001100110011001100 => 0.09999999403953552

编辑:

我在关于C如何处理浮点常量的问题上发表了一个问题,从答案中可以看出,C标准的任何实现都可以选择不同的处理方式。你尝试的实现方式与Go不同。

英文:

Using math.Float32bits and math.Float64bits, you can see how Go represents the different decimal values as a IEEE 754 binary value:

Playground: https://play.golang.org/p/ZqzdCZLfvC

Result:

float32(0.1): 00111101110011001100110011001101
float32(0.2): 00111110010011001100110011001101
float32(0.3): 00111110100110011001100110011010
float64(0.1): 0011111110111001100110011001100110011001100110011001100110011010
float64(0.2): 0011111111001001100110011001100110011001100110011001100110011010
float64(0.3): 0011111111010011001100110011001100110011001100110011001100110011

If you convert these binary representation to decimal values and do your loop, you can see that for float32, the initial value of a will be:

0.20000000298023224
+ 0.10000000149011612
- 0.30000001192092896
= -7.4505806e-9

a negative value that can never never sum up to 1.

So, why does C behave different?

If you look at the binary pattern (and know slightly about how to represent binary values), you can see that Go rounds the last bit while I assume C just crops it instead.

So, in a sense, while neither Go nor C can represent 0.1 exactly in a float, Go uses the value closest to 0.1:

Go:   00111101110011001100110011001101 =&gt; 0.10000000149011612
C(?): 00111101110011001100110011001100 =&gt; 0.09999999403953552

Edit:

I posted a question about how C handles float constants, and from the answer it seems that any implementation of the C standard is allowed to do either. The implementation you tried it with just did it differently than Go.

答案2

得分: 17

同意ANisus的观点,Go语言做得对。关于C语言,我对他的猜测并不信服。

C语言标准没有规定,但大多数libc的实现会将十进制表示转换为最接近的浮点数(至少符合IEEE-754 2008或ISO 10967),所以我认为这不是最有可能的解释。

C程序行为可能不同的原因有很多...特别是一些中间计算可能会使用过量的精度(double或long double)。

我能想到最有可能的事情是,如果你在C中写的是0.1而不是0.1f。
在这种情况下,你可能会在初始化时引起过量的精度
(你将float a和double 0.1相加=>将float转换为double,然后将结果转换回float)

如果我模拟这些操作

float32(float32(float32(0.2) + float64(0.1)) - float64(0.3))

那么我得到的结果接近于1.1920929e-8f

经过27次迭代,这个和为1.6f

英文:

Agree with ANisus, go is doing the right thing. Concerning C, I'm not convinced by his guess.

The C standard does not dictate, but most implementations of libc will convert the decimal representation to nearest float (at least to comply with IEEE-754 2008 or ISO 10967), so I don't think this is the most probable explanation.

There are several reasons why the C program behavior might differ... Especially, some intermediate computations might be performed with excess precision (double or long double).

The most probable thing I can think of, is if ever you wrote 0.1 instead of 0.1f in C.<br>
In which case, you might have cause excess precision in initialization<br>
(you sum float a+double 0.1 => the float is converted to double, then result is converted back to float)

If I emulate these operations

float32(float32(float32(0.2) + float64(0.1)) - float64(0.3))

Then I find something near 1.1920929e-8f

After 27 iterations, this sums to 1.6f

huangapple
  • 本文由 发表于 2014年3月12日 05:50:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/22337418.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定