IEEE 754双精度浮点值在范围[0, 1]内可以表示的最大整数范围是什么?

huangapple go评论50阅读模式
英文:

What is the largest integer range from 0 to N-1 that can be represented by an IEEE 754 double-precision floating point value in the range [0, 1)?

问题

如果我们将双精度浮点范围 [0, 1) 映射到整数范围 [0-N),通过将浮点范围乘以整数值 N,那么最大的数 N 是多少,以便在将 float_value * N 的结果截断/向下取整时,能表示所有的数 [0-N)?不使用扩展精度乘法。N 被舍入为可表示的值。

英文:

If we map the double-precision floating-point range [0, 1) to the integer range [0-N) by multiplying the floating-point range by the integer value N then what is the largest number N such that all numbers [0-N) are represented when truncating/flooring the result of float_value * N to an integer? No extended precision multiplication. N is rounded to a representable value.

答案1

得分: 1

First, consider N = 253. For every integer k ∊ [0, 253), k/253 is representable in IEEE-754 double precision (binary64) as +k•2-53, since each number representable in binary64 may be described as a sign and a 53-bit integer multiplied by a power of 2, 2e, for an integer power e ∊ [−1074, +972). Each such k/253 is in [0, 1), and the floating-point multiplication of each such k/253 by N (253) incurs no rounding error, producing exactly k. Therefore, each integer in [0, N) is mapped to by at least one binary64 number in [0, 1).

N cannot be 253+1 since it is not representable (it requires 54 bits to represent, and the binary64 significand has only 53 bits). For larger N, the integer 253+1 is in [0, N) but cannot be the result of any binary64 multiplication because it is not representable.

Therefore 253 is the largest value for N that satisfies the constraints of the question.

英文:

First, consider N = 2<sup>53</sup>. For every integer k ∊ [0, 2<sup>53</sup>), k/2<sup>53</sup> is representable in IEEE-754 double precision (binary64) as +k•2<sup>−53</sup>, since each number representable in binary64 may be described as a sign and a 53-bit integer multiplied by a power of 2, 2<sup>e</sup>, for an integer power e ∊ [−1074, +972). Each such k/2<sup>53</sup> is in [0, 1), and the floating-point multiplication of each such k/2<sup>53</sup> by N (2<sup>53</sup>) incurs no rounding error, producing exactly k. Therefore, each integer in [0, N) is mapped to by at least one binary64 number in [0, 1).

N cannot be 2<sup>53</sup>+1 since it is not representable (it requires 54 bits to represent, and the binary64 significand has only 53 bits). For larger N, the integer 2<sup>53</sup>+1 is in [0, N) but cannot be the result of any binary64 multiplication because it is not representable.

Therefore 2<sup>53</sup> is the largest value for N that satisfies the constraints of the question.

huangapple
  • 本文由 发表于 2023年6月13日 18:39:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76464038.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定