在将十进制转换为二进制浮点值时,是使用四舍五入还是截断?

huangapple go评论53阅读模式
英文:

When converting from decimal to binary floating point values, is rounding used, or truncation?

问题

当我们首次创建一个二进制浮点值时,通常不会使用舍入(rounding)、卫位(guard)、以及粘位(sticky bit)。相反,通常是将十进制值直接转化为二进制浮点表示。舍入、卫位和粘位通常用于在浮点数运算过程中的规范化和舍入操作中,而不是在初始的二进制浮点表示中。

例如,对于半精度浮点数(10位尾数),如果值为2775.0,其二进制表示为1010 1101 0111b。尾数是0101 1010 111。如果只是截断最后一位,尾数将变为0101 1010 11(相当于2774)。如果进行舍入,尾数将变为0101 1011 00(相当于2776)。

舍入、卫位和粘位主要用于在浮点数运算中保证精度,并不在初始的十进制到二进制转换过程中发挥作用。FPU(浮点运算单元)通常不需要寻找粘位,因为在初始的转换过程中并没有涉及这些概念。这些位主要在浮点数的加法、减法、乘法、除法等运算中用于处理舍入误差,以保持数值精度。

关于从十进制到二进制的初始转换中的舍入操作,通常是按照IEEE 754标准进行的,这会保证在尽可能接近原始十进制值的情况下表示浮点数。具体的舍入规则(如向上舍入、向下舍入、银行家舍入等)取决于具体的实现和标准。但这些规则通常不牵涉到粘位、卫位等概念,而是在尾数位数不足的情况下进行的普通舍入操作。

英文:

So, I am trying to understand floating point operations better, and I understand that when arithmetic operations are performed (I'm looking primarily at RISC) a rounding, guard and sticky bit are used for rounding of the result during normalization.

My question is, when we first "create" a binary floating point value, are rounding, guard and sticky bits utilized at all, or is the mantissa simply truncated? Could you not potentially loop forever trying to populate the sticky bit with a fractional value?

For example, if I am using a half-precision float (10 bit mantissa), with the value of 2775.0 the binary the value is 1010 1101 0111b. The mantissa would therefore be 0101 1010 111. If the last bit is truncated, the mantissa becomes 0101 1010 11 (2774.). If rounding occurs, the mantissa would become 0101 1011 00 (2776.).

Which is it?

I'd also be really interested in understanding how the FPU knows when to "stop" looking for the sticky bit when processing a decimal input value.

I've tried reading up on this and I don't find much on rounding as it relates to the initial conversion from decimal to binary (as far as rounding goes).

答案1

得分: 1

当将十进制转换为二进制浮点值时,是否使用舍入还是截断?

可能会同时使用舍入和截断,规则高度依赖于具体规范。

IEEE 754提供了各种舍入模式。当将十进制文本或十进制浮点值舍入为二进制浮点数时,通常会使用“四舍五入到最近的偶数”(round-to-nearest with ties-to-even)。舍入模式的选择取决于舍入、保护和粘性位的使用。即使其中一些位不影响值,它们会影响诸如“不精确”之类的标志。

IEEE 754还允许在将十进制文本转换为二进制浮点数时忽略/截断(视为零)某一点之后的有效数字。在转换为double时,至少需要+3位小数以便于double-text-double往返(17位)或20+位。

你不可能无限循环以填充粘性位的分数值吗?

粘性位只需要在输入编码的最后一位上循环。它可能涉及查看一些数字,但不会永远循环。对于现代浮点处理器(FPU),很少使用循环,而是进行大规模同时的“或”操作。

英文:

> When converting from decimal to binary floating point values, is rounding used, or truncation?

Potentially both and the rules are highly specification dependent.

IEEE 754 offers various rounding modes. When rounding decimal text or decimal floating point values to binary FP, often round-to-nearest with ties-to-even is used. The use of rounding, guard and sticky bits depend on the rounding mode selected. Even if some of these bits do not affect the value, they affect flags like inexact.


IEEE 754 also allows, when converting decimal text to binary floating point, to ignore/truncate (treat as zeros) significant digits past a certain point. When converting to double this is at least +3 past the number for double-text-double round tripping (17) or 20+.


> Could you not potentially loop forever trying to populate the sticky bit with a fractional value?

A sticky bit only needs, a most, to loop to the last digit on the input encoding. It might involve looking at maybe digits, but not forever. Often the search can quit early once a non-zero digit found. For modern FPUs, rarely is a loop used, but a large simultaneous or.

huangapple
  • 本文由 发表于 2023年2月14日 01:32:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75439329.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定