确定基于阈值将哪个数字范围视为相等

huangapple go评论85阅读模式
英文:

Determining what range of numbers are considered as equal based on a threshold

问题

我们不能直接使用==来比较双精度浮点数,因此推荐的方法之一是使用阈值 epsilon 来确定基于我们的精度标准是否相等,这个标准由阈值定义。

我注意到的是,当有两个双精度浮点数时,通常在相关帖子中推荐的检查是 Math.abs(a - b) < epsilon,这部分让我感到困惑。

在我看来,比较差值应该考虑数字的数量级,而不是使用直接的差值。

示例:
假设阈值为 0.000001,在以下情况下,我们试图用 a 建立相等性:

double a = 57.33;
double b = 57.32973229;
double c = 57.33000002;

b 会被拒绝,因为 57.33 - 57.32973229 = 0.00026771 > 0.000001,但在我看来,这听起来相当不合理,因为这小于 0.0004 误差 (0.00026771/57.33)。在数字越来越大(或越来越小)的情况下,这一点更加明显。

c 将被接受,因为 57.33000002 - 57.33 = 0.00000002 < 0.000001

除非在非常特定的情况下,否则接受只有 c 作为相等似乎相当不切实际。我在这里错过了或误解了什么?

更新:
那么为什么不推荐使用 (a - b)/Max(a,b) < epsilon 的方法呢?

英文:

We can not compare doubles directly using == so one of the recommended methods is to use a threshold epsilon in order to determine equality based on our precision standard as defined by the threshold.
What I have been noticing is that when having 2 doubles the check usually recommended in relevant posts is if Math.abs(a - b) < epsilon which is the part that is confusing to me.
In my understanding the comparison of the diff should be taking into account the magnitude of the numbers and not using the direct diff.
Example:
Assuming the threshold is 0.000001 in the following cases where we try to establish equality with a

double a = 57.33;
double b = 57.32973229;
double c = 57.33000002;

b would be rejected since 57.33 - 57.32973229 = 0.00026771 > 0.000001 but to me it sounds quite unreasonable given the fact that this is less than 0.0004 error (0.00026771/57.33). This is even more obvious with larger and larger (or smaller and smaller) numbers.
c would be accepted since 57.33000002 - 57.33 = 0.00000002 < 0.000001

It seems quite impractical unless in very specific situations to accept as equal only c.
What am I missing/misunderstanding here?

Update:
So why isn't the recommended approach (a - b)/Max(a,b) < epsilon instead?

答案1

得分: 3

在我的理解中,比较差异应考虑数字的幅度,而不是直接比较差异。

一个浮点运算中,计算得到的结果是舍入到最接近可表示值的实数结果。相邻值之间的距离根据结果的幅度进行缩放(有一些量化:在一个binade内,所有具有相同指数的数字都保持恒定大小的步长,然后在转移到相邻的binade时跳跃)。因此,对于一个浮点运算,存在一个与结果幅度成比例的已知误差上限。

那么为什么不推荐使用(a - b)/ Max(a, b) < epsilon的方法呢?

当一个计算涉及多个操作时,每个操作可能存在舍入误差。每个操作的舍入误差都有一个与其结果幅度成比例的已知误差上限。但随着计算经过多个步骤,来自前一步骤的误差可能会受到各种操作的影响而被合并(或抵消)。因此,最终误差不仅与Max(a, b)成比例,还受到所有中间结果的幅度以及误差和操作之间的相互作用的影响。

举个例子,考虑一些非常大的数字x,它是由以前的操作得到的。它可能具有某个大的误差e,但仍然与x成比例。如果我们从x减去接近x幅度的数字y,得到一个小的结果a,其中包含了误差e。而且这个误差ex成比例,所以与之前的误差相比,它可能与a相比巨大,非常不成比例。

这也是为什么没有一般性的解决方案来接受不相等的浮点数作为相等的一部分原因:仅仅从最终结果ab本身,我们无法知道误差可能已经在其中积累多少。Max(a, b)是不够的。误差可以是从零到无穷大,包括NaN。为特定情况设计解决方案需要了解用于获取结果的计算和输入数据。

英文:

> In my understanding the comparison of the diff should be taking into account the magnitude of the numbers and not using the direct diff.

In one floating-point operation, the computed result is the real-number result rounded to the nearest representable value. The distances between nearby values are scaled according to the magnitude of the result (with some quantization: the steps stay a constant size throughout one binade [all the numbers represented with the same exponent] and then jump as you transition to a neighboring binade). Therefore, for one floating-point operation, there is a known bound on the error that is proportional to the magnitude of the result.

> So why isn't the recommended approach (a - b)/Max(a,b) < epsilon instead?

When a computation involves multiple operations, there may be a rounding error in each operation. That rounding error has a known bound that is proportional to the magnitude of its results. But as a computation proceeds through multiple steps, errors from previous steps may be compounded (or canceled) by various operations. So the final error is not proportional to Max(a, b) but is affected by all the magnitudes of the intermediate results as well as interactions between the errors and operations.

As an example, consider some very large number x that resulted from previous operations. It may have some error e that is large, but still proportional to x. If we subtract from x a number y that is near x in magnitude, we get a small result, a, includes that error e. And that error e is proportional to x, so it may be huge compared to a, very disproportionate compared to earlier errors.

This is part of the reason there is no general solution for accepting as equal floating-point numbers that are not equal: Just from the final results a and b alone, we cannot know how much error could have accumulated in them. Max(a, b) is insufficient. The error could be anything between zero and infinity, inclusive, and it could also be NaN. Designing a solution for a particular situation requires knowledge of the computations used to obtain the results and the input data.

huangapple
  • 本文由 发表于 2020年8月4日 23:31:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/63250204.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定