英文:
How to keep precision for big numbers in golang when converting from float to big.Int
问题
我有一个可能是非常大或非常小的浮点数输入,需要将其转换为big.Int
,但由于某种原因,存在一些精度损失。
我知道对于非常小的数字,这种情况应该会发生,但为什么对于大数也会发生,并且如何避免这种情况?
链接:https://go.dev/play/p/AySnKAikSRx
英文:
I have an input that could be a very big or a very small float and need to convert it to big.Int
, but for some reason, there is some precision loss.
I understand that this should happen for very small numbers, but why does it happen for a big number, and how to avoid it?
答案1
得分: 2
所有小于9007199254740992的正整数都可以用float64表示,而不会丢失精度。超过这个范围,就有可能丢失精度,这就是你的情况。
为了给你一个基本的概念,假设我们正在发明一种非常紧凑的方案来表示浮点数,使用以下公式:
m.mm * 10^+-e
其中:
- e = 指数,[1-9]
- m.mm = 小数部分 [0.01-9.99]
通过这个公式,我们可以确定可以表示的值的范围:
- 最小值 = 0.01 * 10^-9 = 0.00000000001
- 最大值 = 9.99 * 10^9 = 9990000000
所以这是一个相当不错的数字范围。
我们可以表示很多正整数而不会有任何困难,例如:
1 = 1.00 * 10^0
2 = 2.00 * 10^0
3 = 3.00 * 10^0
⋮
10 = 1.00 * 10^1
11 = 1.10 * 10^1
12 = 1.20 * 10^1
⋮
100 = 1.00 * 10^2
101 = 1.01 * 10^2
102 = 1.02 * 10^2
⋮
999 = 9.99 * 10^2
当我们超过9.99 * 10^2
时,问题就开始了。表示1000不是问题:
1000 = 1.00 * 10^3
但是如何表示1001呢?下一个可能的值是
1.01 * 10^3 = 1010
这就是+9的精度损失,所以我们只能选择1.00 * 10^3
,精度损失为-1。
以上基本上就是float64的工作原理,只不过是在基数为2的情况下,并且使用了52位的尾数。当所有52位都设置为1,然后再加1时,得到的值是:
1.0 * 2^53 = 9007199254740992
因此,所有小于这个值的正整数都可以在不丢失精度的情况下表示。超过这个值的整数可能会丢失精度,这取决于具体的值。
现在,你Go代码中引用的值是:
var x float64 = 827273999999999954
没有办法将这个确切的值表示为float64。
package main
import (
"fmt"
)
func main() {
var x float64 = 827273999999999954
fmt.Printf("%f\n", x)
}
输出结果是:
827274000000000000.000000
所以在初始化x
时已经丢失了精度。但是这是在什么时候发生的呢?如果我们运行:
$ go build -o tmp
$ go tool objdump tmp
然后搜索TEXT main.main(SB)
,我们可以找到指令:
main.go:10 0x108b654 48b840d5cba322f6a643 MOVQ $0x43a6f622a3cbd540, AX
所以0x43a6f622a3cbd540
被设置到AX寄存器中,这就是我们的float64值。
package main
import (
"fmt"
"math"
)
func main() {
fmt.Printf("float: %f\n", math.Float64frombits(0x43a6f622a3cbd540))
}
输出结果是:
float: 827274000000000000.000000
所以精度在编译时已经丢失了(这是有道理的)。所以在big.NewFloat(x).Int(nil)
这行代码中,作为x
传递的值是827274000000000000.000000
。
如何避免这个问题?
根据你提供的代码,没有办法避免这个问题。
如果你能将该值表示为整数:
package main
import (
"fmt"
"math/big"
)
func main() {
var x uint64 = 827273999999999954
bf := (&big.Float{}).SetUint64(x)
fmt.Println(bf)
}
输出结果是:
8.27273999999999954e+17
这就是你期望的值。或者你也可以使用字符串:
package main
import (
"fmt"
"math/big"
)
func main() {
var x string = "827273999999999954"
bf, ok := (&big.Float{}).SetString(x)
if !ok {
panic("failed to set string")
}
fmt.Println(bf)
}
输出结果也是:
8.27273999999999954e+17
英文:
All positive integers up to 9007199254740992 can be represented in a float64 without any loss of precision. Anything higher, you run the risk of precision loss, which is happening in your case.
To give a basic idea of why..
Say we're inventing an extremely compact scheme for representing floating point numbers using the following formula:
m.mm * 10^+-e
.. where:
- e = exponent, [1-9]
- m.mm = mantissa [0.01-9.99]
With this, we can figure out what range of values can be represented:
- lowest = 0.01 * 10^-9 = 0.00000000001
- highest = 9.99 * 10^9 = 9990000000
So that's a pretty decent range of numbers.
We can represent a fair few positive integers without any difficulty, e.g.
1 = 1.00 * 10^0
2 = 2.00 * 10^0
3 = 3.00 * 10^0
⋮
10 = 1.00 * 10^1
11 = 1.10 * 10^1
12 = 1.20 * 10^1
⋮
100 = 1.00 * 10^2
101 = 1.01 * 10^2
102 = 1.02 * 10^2
⋮
999 = 9.99 * 10^2
The problem starts when we exceed 9.99 * 10^2
. It's not an issue to represent 1000:
1000 = 1.00 * 10^3
But how do represent 1001? The next possible value is
1.01 * 10^3 = 1010
Which is +9 loss of precision, so we have to settle on 1.00 * 10^3
with -1 loss of precision.
The above is in essence how this plays out with float64, except in base 2 and with a 52 bit mantissa in play. With all 52 bits set, and then adding one, the value is:
1.0 * 2^53 = 9007199254740992
So all positive integers up to this value can be represented without precision loss. Integers higher than this may incur precision loss - it very much depends on the value.
Now, the value referenced in your Go code:
var x float64 = 827273999999999954
There is no way to represent this exact value as a float64.
package main
import (
"fmt"
)
func main() {
var x float64 = 827273999999999954
fmt.Printf("%f\n", x)
}
yields..
827274000000000000.000000
So essentially precision is lost by the time x
is initialized. But when does that occur? If we run..
$ go build -o tmp
$ go tool objdump tmp
And search for TEXT main.main(SB)
, we can find the instruction:
main.go:10 0x108b654 48b840d5cba322f6a643 MOVQ $0x43a6f622a3cbd540, AX
So 0x43a6f622a3cbd540
is being set into AX - this is our float64 value.
package main
import (
"fmt"
"math"
)
func main() {
fmt.Printf("float: %f\n", math.Float64frombits(0x43a6f622a3cbd540))
}
prints
float: 827274000000000000.000000
So the precision has essentially been lost at compile time (which makes sense). So on the line of code with big.NewFloat(x).Int(nil)
, the value being passed as x
is 827274000000000000.000000
> how to avoid it?
With the code you've provided, there is no way.
If you're able to represent the value as an integer..
package main
import (
"fmt"
"math/big"
)
func main() {
var x uint64 = 827273999999999954
bf := (&big.Float{}).SetUint64(x)
fmt.Println(bf)
}
yields
8.27273999999999954e+17
which is the value you're expecting. Or alternatively via a string:
package main
import (
"fmt"
"math/big"
)
func main() {
var x string = "827273999999999954"
bf, ok := (&big.Float{}).SetString(x)
if !ok {
panic("failed to set string")
}
fmt.Println(bf)
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论