英文:
Golang floating point precision float32 vs float64
问题
我写了一个用Go语言演示浮点数误差的程序:
func main() {
a := float64(0.2)
a += 0.1
a -= 0.3
var i int
for i = 0; a < 1.0; i++ {
a += a
}
fmt.Printf("After %d iterations, a = %e\n", i, a)
}
它的输出结果是:
After 54 iterations, a = 1.000000e+00
这与使用double
类型编写的C程序的行为相匹配。
然而,如果使用float32
,该程序会陷入无限循环!如果你修改C程序,将double
替换为float
,它会输出:
After 27 iterations, a = 1.600000e+00
为什么当使用float32
时,Go程序的输出结果与C程序不同呢?
英文:
I wrote a program to demonstrate floating point error in Go:
func main() {
a := float64(0.2)
a += 0.1
a -= 0.3
var i int
for i = 0; a < 1.0; i++ {
a += a
}
fmt.Printf("After %d iterations, a = %e\n", i, a)
}
It prints:
After 54 iterations, a = 1.000000e+00
This matches the behaviour of the same program written in C (using the double
type)
However, if float32
is used instead, the program gets stuck in an infinite loop! If you modify the C program to use a float
instead of a double
, it prints
After 27 iterations, a = 1.600000e+00
Why doesn't the Go program have the same output as the C program when using float32
?
答案1
得分: 33
使用math.Float32bits
和math.Float64bits
,你可以看到Go如何将不同的十进制值表示为IEEE 754二进制值:
Playground: https://play.golang.org/p/ZqzdCZLfvC
结果:
float32(0.1): 00111101110011001100110011001101
float32(0.2): 00111110010011001100110011001101
float32(0.3): 00111110100110011001100110011010
float64(0.1): 0011111110111001100110011001100110011001100110011001100110011010
float64(0.2): 0011111111001001100110011001100110011001100110011001100110011010
float64(0.3): 0011111111010011001100110011001100110011001100110011001100110011
如果你将这些二进制表示转换为十进制值并进行循环,你会发现对于float32,a
的初始值将是:
0.20000000298023224
+ 0.10000000149011612
- 0.30000001192092896
= -7.4505806e-9
这是一个永远无法累加到1的负值。
那么,为什么C的行为不同呢?
如果你观察二进制模式(并稍微了解如何表示二进制值),你会发现Go会四舍五入最后一位,而我猜C只是简单地截断它。
因此,在某种意义上,虽然Go和C都不能精确表示0.1,但Go使用最接近0.1的值:
Go: 00111101110011001100110011001101 => 0.10000000149011612
C(?): 00111101110011001100110011001100 => 0.09999999403953552
编辑:
我在关于C如何处理浮点常量的问题上发表了一个问题,从答案中可以看出,C标准的任何实现都可以选择不同的处理方式。你尝试的实现方式与Go不同。
英文:
Using math.Float32bits
and math.Float64bits
, you can see how Go represents the different decimal values as a IEEE 754 binary value:
Playground: https://play.golang.org/p/ZqzdCZLfvC
Result:
float32(0.1): 00111101110011001100110011001101
float32(0.2): 00111110010011001100110011001101
float32(0.3): 00111110100110011001100110011010
float64(0.1): 0011111110111001100110011001100110011001100110011001100110011010
float64(0.2): 0011111111001001100110011001100110011001100110011001100110011010
float64(0.3): 0011111111010011001100110011001100110011001100110011001100110011
If you convert these binary representation to decimal values and do your loop, you can see that for float32, the initial value of a
will be:
0.20000000298023224
+ 0.10000000149011612
- 0.30000001192092896
= -7.4505806e-9
a negative value that can never never sum up to 1.
So, why does C behave different?
If you look at the binary pattern (and know slightly about how to represent binary values), you can see that Go rounds the last bit while I assume C just crops it instead.
So, in a sense, while neither Go nor C can represent 0.1 exactly in a float, Go uses the value closest to 0.1:
Go: 00111101110011001100110011001101 => 0.10000000149011612
C(?): 00111101110011001100110011001100 => 0.09999999403953552
Edit:
I posted a question about how C handles float constants, and from the answer it seems that any implementation of the C standard is allowed to do either. The implementation you tried it with just did it differently than Go.
答案2
得分: 17
同意ANisus的观点,Go语言做得对。关于C语言,我对他的猜测并不信服。
C语言标准没有规定,但大多数libc的实现会将十进制表示转换为最接近的浮点数(至少符合IEEE-754 2008或ISO 10967),所以我认为这不是最有可能的解释。
C程序行为可能不同的原因有很多...特别是一些中间计算可能会使用过量的精度(double或long double)。
我能想到最有可能的事情是,如果你在C中写的是0.1而不是0.1f。
在这种情况下,你可能会在初始化时引起过量的精度
(你将float a和double 0.1相加=>将float转换为double,然后将结果转换回float)
如果我模拟这些操作
float32(float32(float32(0.2) + float64(0.1)) - float64(0.3))
那么我得到的结果接近于1.1920929e-8f
经过27次迭代,这个和为1.6f
英文:
Agree with ANisus, go is doing the right thing. Concerning C, I'm not convinced by his guess.
The C standard does not dictate, but most implementations of libc will convert the decimal representation to nearest float (at least to comply with IEEE-754 2008 or ISO 10967), so I don't think this is the most probable explanation.
There are several reasons why the C program behavior might differ... Especially, some intermediate computations might be performed with excess precision (double or long double).
The most probable thing I can think of, is if ever you wrote 0.1 instead of 0.1f in C.<br>
In which case, you might have cause excess precision in initialization<br>
(you sum float a+double 0.1 => the float is converted to double, then result is converted back to float)
If I emulate these operations
float32(float32(float32(0.2) + float64(0.1)) - float64(0.3))
Then I find something near 1.1920929e-8f
After 27 iterations, this sums to 1.6f
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论