
huangapple go评论74阅读模式

What is the output of '%b' verb when it is floating number


根据go doc的说明,%b与浮点数一起使用时表示:





f := 3.75
fmt.Printf("%b\n", f)
fmt.Println(strconv.FormatFloat(f, 'b', -1, 64))

According to the go doc, %b used with floating number means:

> decimalless scientific notation with exponent a power of two,
in the manner of strconv.FormatFloat with the 'b' format,
e.g. -123456p-78

As the code shows below, the program output is

> 8444249301319680p-51

I'm a little confused about %b in floating number, can anybody tell me how this result is calculated? Also what does p- mean?

f := 3.75
fmt.Printf("%b\n", f)
fmt.Println(strconv.FormatFloat(f, 'b', -1, 64))


得分: 4


8444249301319680*(2^-51) = 3.75 或者 8444249301319680/(2^51) = 3.75

p-51 表示 2^-51,也可以计算为 1/(2^51)



The decimalless scientific notation with exponent a power of two that means follows:

8444249301319680*(2^-51) = 3.75 or 8444249301319680/(2^51) = 3.75

p-51 means 2^-51 which can also be calculated as 1/(2^51)

Nice article on Floating-Point Arithmetic.


得分: 1


  1. 基数始终为10。
  2. 指数必须是非零整数,可以是正数或负数。
  3. 系数的绝对值大于或等于1,但小于10。
  4. 系数带有符号(+)或(-)。
  5. 尾数包含其余的有效数字。


  • %b 以2的幂作为指数的科学计数法(即p
  • %e 科学计数法

The five rules of scientific notation are given below:

  1. The base is always 10
  2. The exponent must be a non-zero integer, which means it can be either positive or negative
  3. The absolute value of the coefficient is greater than or equal to 1 but it should be less than 10
  4. The coefficient carries the sign (+) or (-)
  5. he mantissa carries the rest of the significant digits


  • %b scientific notation with exponent a power of two (its p)
  • %e scientific notation


得分: 1


如果我们忽略“非规格化”的浮点数(稍后可以添加它们),浮点数在内部存储为1.bbbbbb...bbb x 2exp,其中exp是一些位(这里是“b”)的集合,例如,值4存储为1.000...000 <exp> 2。值6存储为1.100...000 <exp> 2,值7存储为1.110...000 <exp> 2,值8存储为1.000...000 <exp> 3。值7.5是1.111 <exp> 2,七又四分之三是1.1111 <exp> 2,依此类推。这里的每个位,在1.bbbb中,表示比指数低的下一个二的幂。

要使用%b格式打印出1.111 <exp> 2,我们只需注意我们需要连续四个1位,即十进制值15或0xf或二进制值1111,这会导致指数需要减3,这样我们就不是乘以22或4,而是乘以2-1或1/2。因此,我们可以取实际指数(2),减去3(因为我们将“点”移动了三次以打印1111二进制或15),因此打印出字符串15p-1


嗯,8444249301319680在十六进制中是1E000000000000。展开成完整的二进制,这是1 1110 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000。这是53个二进制数字。为什么是53个二进制数字,而不是四个就足够了呢?

答案可以在Nick的回答中找到:IEEE 754浮点数格式使用53位的“尾数”或“有效数字”(后者是更好的术语,我通常尽量使用这个术语,但你会经常看到前者)。也就是说,1.bbb...bbb有52个b,再加上强制的前导1。因此,总是恰好有53个二进制数字(对于IEEE“双精度”)。


在IEEE754格式中,指数本身已经以“过量形式”存储,加上了1023(再次是双精度)。这意味着1.111000...000 <exp> 2实际上是以指数值2+1023 = 1025存储的。这意味着为了得到实际的二的幂,机器代码格式化数字的过程中已经要减去1023。我们只需让它同时再减去52。


decimalPart := machineDependentReinterpretation1(&doubleprec_value)
expPart := machineDependentReinterpretation2(&doubleprec_value)


fmt.Sprint("%dp%d", decimalPart, expPart)



  1. 计算1.102 x 22。注意:1.12是1½十进制。
  2. 计算11.02 x 21。(11.02是3。)
  3. 根据上述内容,当你“滑动二进制点”左右时会发生什么?
  4. (更难)为什么我们可以假设有一个前导的1?如果需要,可以继续阅读。


首先让我们注意,在十进制中使用科学计数法时,我们不能假设有一个前导的1。一个数可能是1.7 x 103,或者是5.1 x 105,或者其他任何数。但是当我们正确使用科学计数法时,第一个数字永远不是。也就是说,我们不会写成0.3 x 100,而是写成3.0 x 10-1。在这种表示法中,数字的位数告诉我们精度,而第一个数字永远不会是零,通常也不应该是零。如果第一个数字是零,我们只需移动小数点并调整指数(参见上面的练习1和2)。







It is worth pointing out that the %b output is particularly easy for the runtime system to generate as well, due to the internal storage format for floating point numbers.

If we ignore "denormalized" floating point numbers (we can add them back later), a floating point number is stored, internally, as 1.bbbbbb...bbb x 2<sup>exp</sup> for some set of bits ("b" here), e.g., the value four is stored as 1.000...000 &lt;exp&gt; 2. The value six is stored as 1.100...000 &lt;exp&gt; 2, the value seven is stored as 1.110...000 &lt;exp&gt; 2, and eight is stored as 1.000...000 &lt;exp&gt; 3. The value seven-and-a-half is 1.111 &lt;exp&gt; 2, seven and three quarters is 1.1111 &lt;exp&gt; 2, and so on. Each bit here, in the 1.bbbb, represents the next power of two lower than the exponent.

To print out 1.111 &lt;exp&gt; 2 with the %b format, we simply note that we need four 1 bits in a row, i.e., the value 15 decimal or 0xf or 1111 binary, which causes the exponent to need to be decreased by 3, so that instead of multiplying by 2<sup>2</sup> or 4, we want to multiply by 2<sup>-1</sup> or ½. So we can take the actual exponent (2), subtract 3 (because we moved the "point" three times to print 1111 binary or 15), and hence print out the string 15p-1.

That's not what Go's %b prints though: it prints 8444249301319680p-50. This is the same value (so either one would be correct output)—but why?

Well, 8444249301319680 is, in hexadecimal, 1E000000000000. Expanded into full binary, this is 1 1110 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000. That's 53 binary digits. Why 53 binary digits, when four would suffice?

The answer to that is found in the link in Nick's answer: IEEE 754 floating point format uses a 53-digit "mantissa" or "significand" (the latter is the better term and the one I usually try to use, but you'll see the former pop up very often). That is, the 1.bbb...bbb has 52 bs, plus that forced-in leading 1. So there are always exactly 53 binary digits (for IEEE "double precision").

If we just treat this 53-binary-digit number as a decimal number, we can always print it out without a decimal point. That means we just adjust the power-of-two exponent.

In IEEE754 format, the exponent itself is already stored in "excess form", with 1023 added (for double precision again). That means that 1.111000...000 &lt;exp&gt; 2 is actually stored with an exponent value of 2+1023 = 1025. What this means is that to get the actual power of two, the machine code formatting the number is already going to have to subtract 1023. We can just have it subtract 52 more at the same time.

Last, because the implied 1 is always there, the internal IEEE754 number doesn't actually store the 1 bit. So to read out the value and convert it, the code internally does:

decimalPart := machineDependentReinterpretation1(&amp;doubleprec_value)
expPart := machineDependentReinterpretation2(&amp;doubleprec_value)

where the machine-dependent-reinterpretation simply extracts the correct bits, puts in the implied 1 bit as needed in the decimal part, subtracts the offset (1023+52) for the exponent part, and then does:

fmt.Sprint(&quot;%dp%d&quot;, decimalPart, expPart)

When printing a floating-point number in decimal, the base conversion (from base 2 to base 10) is problematic, requiring a lot of code to get the rounding right. Printing it in binary like this is much easier.

Exercises for the reader, to help with understanding this:

  1. Compute 1.10<sub>2</sub> x 2<sup>2</sup>. Note: 1.1<sub>2</sub> is 1½ decimal.
  2. Compute 11.0<sub>2</sub> x 2<sup>1</sup>. (11.0<sub>2</sub> is 3.)
  3. Based on the above, what happens as you "slide the binary point" left and right?
  4. (more difficult) Why can we assume a leading 1? If necessary, read on.

Why we can assume a leading 1?

Let's first note that when we use scientific notation in decimal, we can't assume a leading 1. A number might be 1.7&nbsp;x&nbsp;10<sup>3</sup>, or 5.1&nbsp;x&nbsp;10<sup>5</sup>, or whatever. But when we use scientific notation "correctly", the first digit is never zero. That is, we do not write 0.3&nbsp;x&nbsp;10<sup>0</sup> but rather 3.0&nbsp;x&nbsp;10<sup>-1</sup>. In this kind of notation, the number of digits tells us about the precision, and the first digit never has to be zero and generally isn't supposed to be zero. If the first digit were zero, we just move the decimal point and adjust the exponent (see exercises 1 and 2 above).

The same rules apply with floating-point numbers. Instead of storing 0.01, for instance, we just slide the binary point two over two positions and get 1.00, and decrease the exponent by 2. If we might want to have stored 11.1, we slide the binary point one position the other way and increase the exponent. Whenever we do this, the first digit always winds up being a one.

There is one big exception here, which is: when we do this, we can't store zero! So we don't do this for the number 0.0. In IEEE754, we store 0.0 as all-zero-bits (except for the sign, which we can set to store -0.0). This has an all-zero exponent, which the computer hardware handles as a special case.

Denormalized numbers: when we can't assume a leading 1

This system has one notable flaw (which isn't entirely fixed by denorms, but nonetheless, IEEE has denorms). That is: the smallest number we can store "abruptly underflows" to zero. Kahan has a 15 page "brief tutorial" on gradual underflow, which I am not going to attempt to summarize, but when we hit the minimum allowed exponent (2<sup>-1023</sup>) and want to "get smaller", IEEE lets us stop using these "normalized" numbers with the leading 1 bit.

This doesn't affect the way that Go itself formats floating point numbers, because Go just takes the entire significand "as is". All we have to do is stop inserting the 2<sup>53</sup> "implied 1" when the input value is a denormalized number, and everything else Just Works. We can hide this magic inside the machine-dependent float64 reinterpretation code, or do it explicitly in Go, whichever is more convenient.

  • 本文由 发表于 2021年8月13日 16:19:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/68769014.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
