英文:
Why not use treat decimals as integers when converting decimals into binary floating points
问题
我知道在将数字转换为二进制时,我们应该在小数点之前和之后不同地处理数字,例如,0.625 应该转换为 0.101。对于小数点后的数字,我们继续乘以 2 并得到其整数部分如下:
0.625 * 2 = 1.25 ---- 1
0.25 * 2 = 0.5 ---- 0
0.5 * 2 = 1 ---- 1
然而,对于像 0.1 这样的数字,这种方法是行不通的,因为循环是无限的,将导致结果像 0.0001100110011... 这样,从而导致精度损失。
所以,为什么不把小数点去掉,将小数视为整数呢?例如,对于 0.625,我们只需直接计算 625 的二进制表示,并记录指数,就像浮点类型一样(这里是 10^-3)。这种方法可以防止许多情况下的精度损失,并且完美模拟了人类计算小数的方式。
如果原始小数很长,我们只需将其截断到最大长度,这不会损失太多精度。我真的不知道为什么我们必须使用“乘以 2”的方法,这会在像 0.1 这样的简单数字中引入错误。
我已经尝试将许多小数转换为二进制,我的方法在大多数情况下比“乘以 2”的方法效果更好。请告诉我在这个过程中我漏掉了什么。
请告诉我为什么我错了。谢谢!
英文:
I know that we should treat the digits differently before and after the decimal point when converting the number into binary, e.g., 0.625 should be converted into 0.101. For digits after decimal points, we keep multiplying 2 and get its integer part as follows:
0.625 * 2 = 1.25 ---- 1
0.25 * 2 = 0.5 ---- 0
0.5 * 2 = 1 ---- 1
However, this method is not feasible for numbers like 0.1, since the loop is infinite and will give results like 0.0001100110011..., which leads to loss of precision.
So, why not we just treat the decimals as integers by removing the decimal point? E.g. for 0.625, we just directly calculate the binary representation of 625, and record the exponent just as the float type does (10^-3 here). This method can prevent the loss of precision for many cases, and it perfectly simulates how human calculate decimals.
If the original decimal is long, we can just cut it at maximal length, which didn't lose much precision. I don't really know why we must use "mulitply 2" method, which introduces errors for simple numbers like 0.1.
I've tried to convert many decimals into binary, and my method works better than the "multiply 2" method in most cases. Please tell me what I omitted in this process.
Please tell me why I'm wrong. Thanks!
答案1
得分: 1
你所提出的被称为定点与以10为基数的缩放因子。它是可行的,也用于防止舍入误差(例如用于货币计算)。
然而,使用正常的二进制表示法(或以2为基数的定点缩放)更快速和更方便(甚至从硬件角度来看),因为它简化了许多操作(*,/,pow,log,exp,...)。
同样的舍入问题也会在十进制基数中出现,只需尝试在十进制中写1/3
... 它也是永不终止的一系列数字...
1/3 = 0.33333333333333333333333333...
英文:
what you proposing is called fixed point with power of 10 scaling factor. Its doable and its also used to prevent rounding errors (for example for currency computations)
however using normal binary representation (or power of 2 scaling with fixed point) is faster and much more convenient (even from HW perspective) as it simplifies many operations (*,/,pow,log,exp,...)
Also this same rounding problem arises with decadic base too just try to write 1/3
in decadic ... its also never ending series of digits ...
1/3 = 0.33333333333333333333333333...
答案2
得分: 0
Sure, here are the translated parts:
However, this method is not feasible for numbers like 0.1, since the loop is infinite and will give results like 0.0001100110011..., which leads to loss of precision.
然而,对于像0.1这样的数字,这种方法是不可行的,因为循环是无限的,将导致结果如0.0001100110011...,从而导致精度丢失。
Mathematically it is infinite, yet 0.1 encoded using a binary floating point is not possible. Instead we seek a nearby value like
0.10000000000000000555...
or0x1.999999999999ap-4
. After perhaps 53 significant binary digits, we only need a few more to detect rounding needs.
从数学上讲,它是无限的,但使用二进制浮点数编码的0.1是不可能的。相反,我们寻找一个接近的值,如0.10000000000000000555...
或0x1.999999999999ap-4
。也许在53个重要的二进制数字之后,我们只需要几个数字来检测舍入的需要。
Note that the doubling of the decimal text (the * 2) may oblige 100s of decimal digits.
请注意,十进制文本的加倍(* 2)可能需要100多个十进制数字。
If the original decimal is long, we can just cut it at maximal length, which didn't lose much precision.
如果原始的十进制数很长,我们可以在最大长度处截断它,这不会损失太多精度。
Depends if you want a good answer or the best answer.
这取决于您是否想要一个好的答案还是最佳答案。
Consider the typical smallest encodable floating point value of about
4.9406564584124654e-324
whose exact value as a decimal is ~750 significant decimal digits long.
考虑典型的最小可编码浮点值,约为4.9406564584124654e-324
,其十进制精确值约为750个重要的十进制数字长。
Now consider the next larger floating point value, again ~750 significant decimal digits long.
现在考虑下一个较大的浮点值,同样约为750个重要的十进制数字长。
Consider
x
that is half-way between those 2 as text.
考虑x
,它作为文本位于这两者之间。
Consider values near
x
, as decimal text (yet with the same first 750 significant decimal digits) that are slightly larger or smaller. To convert that to the best floating point value requires a maximal length of ~750 of digits so you do not lose precision and properly round up or down.
考虑接近x
的值,作为_十进制文本_(但具有相同的前750个重要的十进制数字),这些值略大或略小。要将其转换为_最佳_浮点值,需要_最大长度_约为750个数字,以确保不会失去_精度_并正确地四舍五入。
Your cut requirement is over 750 digits to always get the best answer.
您的_截断_要求超过750个数字,以始终获得_最佳_答案。
In the end, real good decimal text to floating point is all done with wide integer math.
最终,将真正的好的十进制文本转换为浮点数是通过宽整数数学完成的。
英文:
> However, this method is not feasible for numbers like 0.1, since the loop is infinite and will give results like 0.0001100110011..., which leads to loss of precision.
Mathematically it is infinite, yet 0.1 encoded using a binary floating point is not possible. Instead we seek a nearby value like 0.10000000000000000555...
or 0x1.999999999999ap-4
. After perhaps 53 significant binary digits, we only need a few more to detect rounding needs.
Note that the doubling of the decimal text (the * 2) may oblige 100s of decimal digits.
> If the original decimal is long, we can just cut it at maximal length, which didn't lose much precision.
Depends if you want a good answer or the best answer.
Consider the typical smallest encodable floating point value of about 4.9406564584124654e-324
whose exact value as a decimal is ~750 significant decimal digits long.
Now consider the next larger floating point value, again ~750 significant decimal digits long.
Consider x
that is half-way between those 2 as text.
Consider values near x
, as decimal text (yet with the same first 750 significant decimal digits) that are slightly larger or smaller. To convert that to the best floating point value requires a maximal length of ~750 of digits so you do not lose precision and properly round up or down.
Your cut requirement is over 750 digits to always get the best answer.
In the end, real good decimal text to floating point is all done with wide integer math.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论