有没有一种方法来减轻舍入误差?

huangapple go评论65阅读模式
英文:

Is there one way to alleviate roundoff errors?

问题

以下是您提供的代码部分的翻译:

#include <stdio.h>
int main(){
   double a;
   int i;

   a = 0.2; 
   a += 0.1; 
   a -= 0.3;

   for (i = 0; a < 1.0; i++) 
       a += a;

   printf("i=%d, a=%f\n", i, a);

   return 0;
}

使用我的 Zen2 R7 4800H CPU,我通过以下命令编译了上述源代码 Guard_digit.cgcc Guard_digit.c -std=c17 -march=znver2 -pedantic -O0 -o With_Guard_digit.o。然后它输出与 Wikipedia 相同的结果 i=54, a=1.000000

正如这个 注释 所说,IEEE 标准已经实现了保护位(guard digit):

IEEE 标准要求使用比单精度表示中的 24 位(尾数部分)更不重要的 3 个额外位。

尾数格式加上额外位:

1.XXXXXXXXXXXXXXXXXXXXXXX 0 0 0

^ ^ ^ ^ ^
| | | | |
| | | | - 粘滞位(sticky bit,s)
| | | - 四舍五入位(round bit,r)
| | - 保护位(guard bit,g)
| - 23 位尾数来自一种表示

  • 隐藏位

问题:是否有一种方法可以通过更改源代码或其他方法(例如,错误偏移可以在某种程度上被减轻,以便输出类似 i=108, a=1.000000 的结果?

在查看 Eric Postpischil 的答案后进行编辑:

抱歉描述问题不够清晰。我想知道如何通过保持原始计算来解决四舍五入问题,因此不考虑直接 a = 0;

我想解决这个具体的问题,而不是一般性的问题。正如 评论 所说,这超出了我的当前能力范围。

英文:

Wikipedia about "guard bits" offers one example codes:

#include &lt;stdio.h&gt;
int main(){
   double a;
   int i;

   a = 0.2; 
   a += 0.1; 
   a -= 0.3;

   for (i = 0; a &lt; 1.0; i++) 
       a += a;

   printf(&quot;i=%d, a=%f\n&quot;, i, a);

   return 0;
}

With my zen2 r7 4800h cpu, I compiled the above source code Guard_digit.c by gcc Guard_digit.c -std=c17 -march=znver2 -pedantic -O0 -o With_Guard_digit.o. Then it outputs same as wikipedia i=54, a=1.000000.

And as this note says, IEEE standard has implemented the guard digit:

> The IEEE standard requires the use of 3 extra bits of less significance
than the 24 bits (of mantissa) implied in the single precision
representation.
>
> mantissa format plus extra bits:
>
> 1.XXXXXXXXXXXXXXXXXXXXXXX 0 0 0
>
> ^ ^ ^ ^ ^
> | | | | |
> | | | | - sticky bit (s)
> | | | - round bit (r)
> | | - guard bit (g)
> | - 23 bit mantissa from a representation
> - hidden bit

Q: Is there one way to solve with this precision and roundoff problem by changing the source codes or others (i.e. the error offset can be alleviated to some degree so that it may output something like i=108, a=1.000000)?

Edit after viewing the answer by Eric Postpischil:

Sorry for unclearly describing the problem. I want to know how to solve the roundoff problem by keeping the original calculations, so directly a = 0; is not taken in account.

I want to solve this specific problem, but not general. This is beyond my current range just as the comment says.

答案1

得分: 3

以下是要翻译的内容:

Q: Is there one way to solve with this precision and roundoff problem by changing the source codes or others (i.e. the error offset can be alleviated to some degree so that it may output something like i=108, a=1.000000)?

在常见的C实现中,通过改变源代码或其他方式(即误差偏移可以在一定程度上减轻,以便输出类似于 i=108, a=1.000000 的结果,解决这个精度和舍入误差问题是不可能的。

这是因为常见的C实现使用IEEE-754二进制64位,也称为“双精度”,来表示double,而二进制64位使用53个有效位。这意味着以.0625开始的二进制区间使用一个有效位,其高位的位置值为2−4(.0625),低位的位置值为2−56(跨越53位,包括两个端点)。

加法和减法可以将位移到高位,就像小学算术中所教授的那样,但永远不会在最低的输入位置生成非零位。因此,通过添加和减去大于或等于.0625的值产生的任何结果都不能在2−56以下具有非零位。

因此,在执行这种算术后进入循环时,我们有以下一种情况:

  • a 是负数或零,循环永远不会终止。
  • a 大于或等于2−56,迭代57次或更少会使其大于1。

是否有一种方法可以通过更改源代码来解决这个精度和舍入误差问题…

显然,通过更改源代码,可以获得0.2 + 0.1 - 0.3的正确结果,从而从以下代码:

a = 0.2; 
a += 0.1; 
a -= 0.3;

改为:

a = 0;

这是计算中常见的问题:你不能通过询问“如何获得这些值的解决方案?”来正式描述你要解决的一般问题,因为那样只有一个答案适用于那些值,而且不能帮助你解决一般问题。

相反,你必须描述整个问题类别。例如,你可以问:“如何编写代码,以找到最多三位小数点后和两位小数点前的30个正数和负数十进制数的精确十进制和?”

此外,请注意,不要过度泛化问题,使问题完全通用而不是完全具体化。如果问题是要添加和减去任何十进制数而没有误差,那么你必须编写任意精度算术。如果问题是添加和减去一些数量适度的十进制数,那么可以通过使用精心选择的舍入的double算术来解决问题。而具体的解决方案可能取决于你选择的参数。因此,你需要很好地描述问题。

英文:

> Q: Is there one way to solve with this precision and roundoff problem by changing the source codes or others (i.e. the error offset can be alleviated to some degree so that it may output something like i=108, a=1.000000)?

In common C implementations, it is impossible to produce an a by adding and/or subtracting values at or above .0625 that will cause the loop shown to terminate after iterating more than 57 times.

This is because common C implementations use IEEE-754 binary64, also called “double precision,” for double, and binary64 uses 53 significant bits. This means that a value in the binade starting at .0625 is represented with a significand whose high bit has position value 2<sup>−4</sup> (.0625) and whose low bit has position value 2<sup>−56</sup> (spanning 53 bits, including both endpoints).

Adding and subtracting can carry bits to high positions, as taught in elementary school arithmetic, but can never generate non-zero bits below the lowest input position. Therefore, any result produced by adding and subtracting values greater than or equal to .0625 cannot have any non-zero bits below 2<sup>−56</sup>.

Therefore, when entering the loop after performing such arithmetic, we have one of the following cases:

  • a is negative or zero, and the loop never terminates.
  • a is 2<sup>−56</sup> or greater, and iterating 57 times or fewer will making it greater than 1.

> Is there one way to solve with this precision and roundoff problem by changing the source codes…

Obviously the correct result of 0.2 + 0.1 - 0.3 can be obtained by changing the source code from:

a = 0.2; 
a += 0.1; 
a -= 0.3;

to:

a = 0;

This is a common problem in computing: You cannot formally describe the general problem you want to solve by asking “How do I get a solution for these values?”, because then there is simple solution that is just the one answer for those values, and it does not help you generally.

Instead, you must describe the entire class of problems. For example, you could ask: “How can I write code that finds the exact decimal sum of up 30 positive and negative decimal numerals with at most three digits after the decimal point and two digits before the decimal point?”

Note further you do not wish to go too far in the other direction, making the problem fully general instead of fully specific. If the problem is to add and subtract any decimal numerals with no error, then you must write arbitrary precision arithmetic. If the problem is to add and subtract some modest number of decimal numerals with some modest number of digits, then the problem may be solvable by using double arithmetic with well-chosen rounding. And specific solutions may depend on the parameters you choose. So you need to characterize the problem well.

huangapple
  • 本文由 发表于 2023年7月18日 14:14:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76709959.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定