What does "%b" do in fmt.Printf for float64 and what is Min subnormal positive double in float64 in binary format?

huangapple go评论93阅读模式
英文:

What does "%b" do in fmt.Printf for float64 and what is Min subnormal positive double in float64 in binary format?

问题

4503599627370496p-52 是一个浮点数,它使用十六进制表示法,并具有指数部分。在这个表示法中,p 后面的数字表示指数的值,而前面的数字表示尾数的值。具体来说,4503599627370496 是尾数的十六进制表示,-52 是指数的值。这种表示法常用于计算机中处理非常大或非常小的浮点数。

英文:

Go doc for Package fmt Floating-point and complex constituents
says:

> Floating-point and complex constituents:
%b decimalless scientific notation with exponent a power of two,
in the manner of strconv.FormatFloat with the 'b' format,
e.g. -123456p-78

Code:

fmt.Printf("0b%b\n", 255) // 0b11111111
fmt.Printf("%b\n", 1.0)   // 4503599627370496p-52 

What is 4503599627370496p-52?

答案1

得分: 4

我做了一些研究,并在IEEE 754二进制表示法方面进行了多个小时的研究:

一个好的起点是:

https://en.wikipedia.org/wiki/Double-precision_floating-point_format
https://en.wikipedia.org/wiki/IEEE_floating_point

结果:

package main
import (
	"fmt"
	"math"
	"strconv"
)

func main() {
	fmt.Printf("0b%b\n", 255) //0b11111111

	fmt.Printf("%b\n", 1.0)               //4503599627370496p-52
	fmt.Printf("%#X\n", 4503599627370496) //0X10000000000000
	//float64: 1.0 = binary: 0X3FF0000000000000
	//so 4503599627370496*2**-52 =("1"+"Fraction")*2**-52
	//=0X10 0000 0000 0000*2**-52=  2**52 * 2**-52 = 2**0 = 1=   significand
	//2**52=0x10 0000 0000 0000

	//1bit=sign 11bit=exponent-biased 52bit)=64 bit:
	fmt.Printf("%#X\n", math.Float64bits(1.0)) //1.0=0X3FF0000000000000
	//Exp: 0x3FF=1023=E(0) :bias=1023 Emin=1 Emax=2046    Exp=pow(2,0x3FF - 1023)=pow(2,0)=1
	//significant: 1.mantisa (53bit nice!) 1.0000000000000

	//1.0000000000000002, the smallest number > 1
	fmt.Printf("%#X\n", math.Float64bits(1.0000000000000002)) //0X3FF0000000000001

	// 1.0000000000000004, the next numer after 1.0000000000000002
	fmt.Printf("%#X\n", math.Float64bits(1.0000000000000004)) //0X3FF0000000000002

	fmt.Printf("%#X\n", math.Float64bits(2.0))  //0X4000000000000000
	fmt.Printf("%#X\n", math.Float64bits(-2.0)) //0XC000000000000000

	// Min subnormal positive double
	fmt.Printf("%v\n", math.Float64frombits(1))   //5e-324
	fmt.Printf("%#X\n", math.Float64bits(5e-324)) //0X0000000000000001
	//Exp(2,-1022-52)=Exp(2,-1074)=5e-324

	//Max subnormal double
	fmt.Printf("%v\n", math.Float64frombits(0x000fffffffffffff)) //2.225073858507201e-308

	fmt.Printf("%v\n", math.Float64frombits(0X0000000000000000)) //0
	fmt.Printf("%v\n", math.Float64frombits(0X8000000000000000)) //-0
	fmt.Printf("%v\n", math.Float64frombits(0X7FF0000000000000)) //+Inf
	fmt.Printf("%v\n", math.Float64frombits(0XFFF0000000000000)) //-Inf
	fmt.Printf("%v\n", math.Float64frombits(0x7fffffffffffffff)) //NaN

	fmt.Printf("%#X\n%[1]b\n", math.Float64bits(0.1)) //0X3FB 999999999999A
	//0 1111111011 1001100110011001100110011001100110011001100110011010

	fmt.Printf("%#X\n%[1]b\n", math.Float64bits(0.2)) //0X3FC999999999999A
	//11111111001001100110011001100110011001100110011001100110011010

	fmt.Printf("%#X\n%[1]b\n", math.Float64bits(0.3)) //0X3FD3333333333333
	//11111111010011001100110011001100110011001100110011001100110011
	fmt.Println(1.0 / 3.0) //0.3333333333333333
	//By default, 1/3 rounds down, instead of up like single precision,
	//because of the odd number of bits in the significand
	fmt.Printf("%#X\n%[1]b\n", math.Float64bits(1.0/3.0)) //0X3FD5555555555555
	//11111111010101010101010101010101010101010101010101010101010101
	/*
	   Given the hexadecimal representation 3FD5 5555 5555 555516,
	     Sign = 0
	     Exponent = 0x3FD = 1021
	     Exponent Bias = 1023 (constant value)
	     Fraction = 5 5555 5555 555516
	     Value = 2(Exponent − Exponent Bias) × 1.Fraction // Note that Fraction must not be converted to decimal here
	           = 2**−2 × (15 5555 5555 555516 × 2**−52)
	           = 2**−54 × 15 5555 5555 555516
	           = 0.333333333333333314829616256247390992939472198486328125
	           ≈ 1/3
	*/

	var f float64 = 0.1
	var bits uint64 = math.Float64bits(f) //IEEE 754 binary representation of f
	fmt.Printf("%#X %[1]b\n", bits)
	//0X3FB999999999999A 11111110111001100110011001100110011001100110011001100110011010

	fmt.Printf("%b\n", f) //7205759403792794p-56
	fmt.Printf("%b  %b\n", 7205759403792794, -56)
	//11001100110011001100110011001100110011001100110011010  111000
	fmt.Println(len("11001100110011001100110011001100110011001100110011010"))
	//text search in this text=> 53 bit right side

	// 1 11111110101 100011110110001111100111100101011000111010000110011
	fmt.Printf("-: %b\n", math.Float64bits(-0.1e+308)) // so left bit is sign bit
	//1 111111110101 100011110110001111100111100101011000111010000110011
	fmt.Printf("exp: %b\n", math.Float64bits(+0.1e-308))
	//1 01110000001 010101110010011010001111110110101111
	// 11Exponent bits

	//  12345678901
	i, err := strconv.ParseInt("11111110101", 2, 64) //2037
	fmt.Println("E", i-1023)                         //1014

	fmt.Printf("%b\n", 0.2) //7205759403792794p-55
	fmt.Printf("%b\n", 0.3) //5404319552844595p-54

	n, err := fmt.Printf("%b %b\n", 1.0, math.Float64bits(1.0))
	//4503599627370496p-52 11111111110000000000000000000000000000000000000000000000000000
	fmt.Println(n, err) //84 <nil>
	//no err
	fmt.Printf("'%[1]*.[2]*[3]f'\n", 12, 4, 1234.1234) //'   1234.1234'

}

Conclusion:
 `%b` 对于 `float64` 只显示 **significand**完成

<details>
<summary>英文:</summary>

I did some research and after many hours of research with IEEE 754 binary representation:    
A good point to start is:

https://en.wikipedia.org/wiki/Double-precision_floating-point_format
https://en.wikipedia.org/wiki/IEEE_floating_point

Results:


    package main
    import (
     	&quot;fmt&quot;
   	    &quot;math&quot;
   	    &quot;strconv&quot;
           )

    func main() {
   	fmt.Printf(&quot;0b%b\n&quot;, 255) //0b11111111
   
   	fmt.Printf(&quot;%b\n&quot;, 1.0)               //4503599627370496p-52
   	fmt.Printf(&quot;%#X\n&quot;, 4503599627370496) //0X10000000000000
   	//float64: 1.0 = binary: 0X3FF0000000000000
   	//so 4503599627370496*2**-52 =(&quot;1&quot;+&quot;Fraction&quot;)*2**-52
   	//=0X10 0000 0000 0000*2**-52=  2**52 * 2**-52 = 2**0 = 1=   significand
   	//2**52=0x10 0000 0000 0000
   
   	//1bit=sign 11bit=exponent-biased 52bit)=64 bit:
   	fmt.Printf(&quot;%#X\n&quot;, math.Float64bits(1.0)) //1.0=0X3FF0000000000000
   	//Exp: 0x3FF=1023=E(0) :bias=1023 Emin=1 Emax=2046    Exp=pow(2,0x3FF - 1023)=pow(2,0)=1
   	//significant: 1.mantisa (53bit nice!) 1.0000000000000
   
   	//1.0000000000000002, the smallest number &gt; 1
   	fmt.Printf(&quot;%#X\n&quot;, math.Float64bits(1.0000000000000002)) //0X3FF0000000000001
   
   	// 1.0000000000000004, the next numer after 1.0000000000000002
   	fmt.Printf(&quot;%#X\n&quot;, math.Float64bits(1.0000000000000004)) //0X3FF0000000000002
   
   	fmt.Printf(&quot;%#X\n&quot;, math.Float64bits(2.0))  //0X4000000000000000
   	fmt.Printf(&quot;%#X\n&quot;, math.Float64bits(-2.0)) //0XC000000000000000
   
   	// Min subnormal positive double
   	fmt.Printf(&quot;%v\n&quot;, math.Float64frombits(1))   //5e-324
   	fmt.Printf(&quot;%#X\n&quot;, math.Float64bits(5e-324)) //0X0000000000000001
   	//Exp(2,-1022-52)=Exp(2,-1074)=5e-324
   
   	//Max subnormal double
   	fmt.Printf(&quot;%v\n&quot;, math.Float64frombits(0x000fffffffffffff)) //2.225073858507201e-308
   
   	fmt.Printf(&quot;%v\n&quot;, math.Float64frombits(0X0000000000000000)) //0
   	fmt.Printf(&quot;%v\n&quot;, math.Float64frombits(0X8000000000000000)) //-0
   	fmt.Printf(&quot;%v\n&quot;, math.Float64frombits(0X7FF0000000000000)) //+Inf
   	fmt.Printf(&quot;%v\n&quot;, math.Float64frombits(0XFFF0000000000000)) //-Inf
   	fmt.Printf(&quot;%v\n&quot;, math.Float64frombits(0x7fffffffffffffff)) //NaN
   
   	fmt.Printf(&quot;%#X\n%[1]b\n&quot;, math.Float64bits(0.1)) //0X3FB 999999999999A
   	//0 1111111011 1001100110011001100110011001100110011001100110011010
   
   	fmt.Printf(&quot;%#X\n%[1]b\n&quot;, math.Float64bits(0.2)) //0X3FC999999999999A
   	//11111111001001100110011001100110011001100110011001100110011010
   
   	fmt.Printf(&quot;%#X\n%[1]b\n&quot;, math.Float64bits(0.3)) //0X3FD3333333333333
   	//11111111010011001100110011001100110011001100110011001100110011
   	fmt.Println(1.0 / 3.0) //0.3333333333333333
   	//By default, 1/3 rounds down, instead of up like single precision,
   	//because of the odd number of bits in the significand
   	fmt.Printf(&quot;%#X\n%[1]b\n&quot;, math.Float64bits(1.0/3.0)) //0X3FD5555555555555
   	//11111111010101010101010101010101010101010101010101010101010101
   	/*
   	   Given the hexadecimal representation 3FD5 5555 5555 555516,
   	     Sign = 0
   	     Exponent = 0x3FD = 1021
   	     Exponent Bias = 1023 (constant value)
   	     Fraction = 5 5555 5555 555516
   	     Value = 2(Exponent − Exponent Bias) &#215; 1.Fraction // Note that Fraction must not be converted to decimal here
   	           = 2**−2 &#215; (15 5555 5555 555516 &#215; 2**−52)
   	           = 2**−54 &#215; 15 5555 5555 555516
   	           = 0.333333333333333314829616256247390992939472198486328125
   	           ≈ 1/3
   	*/
   
   	var f float64 = 0.1
   	var bits uint64 = math.Float64bits(f) //IEEE 754 binary representation of f
   	fmt.Printf(&quot;%#X %[1]b\n&quot;, bits)
   	//0X3FB999999999999A 11111110111001100110011001100110011001100110011001100110011010
   
   	fmt.Printf(&quot;%b\n&quot;, f) //7205759403792794p-56
   	fmt.Printf(&quot;%b  %b\n&quot;, 7205759403792794, -56)
   	//11001100110011001100110011001100110011001100110011010  111000
   	fmt.Println(len(&quot;11001100110011001100110011001100110011001100110011010&quot;))
   	//text search in this text=&gt; 53 bit right side
   
   	// 1 11111110101 100011110110001111100111100101011000111010000110011
   	fmt.Printf(&quot;-: %b\n&quot;, math.Float64bits(-0.1e+308)) // so left bit is sign bit
   	//1 111111110101 100011110110001111100111100101011000111010000110011
   	fmt.Printf(&quot;exp: %b\n&quot;, math.Float64bits(+0.1e-308))
   	//1 01110000001 010101110010011010001111110110101111
   	// 11Exponent bits
   
   	//  12345678901
   	i, err := strconv.ParseInt(&quot;11111110101&quot;, 2, 64) //2037
   	fmt.Println(&quot;E&quot;, i-1023)                         //1014
   
   	fmt.Printf(&quot;%b\n&quot;, 0.2) //7205759403792794p-55
   	fmt.Printf(&quot;%b\n&quot;, 0.3) //5404319552844595p-54
   
   	n, err := fmt.Printf(&quot;%b %b\n&quot;, 1.0, math.Float64bits(1.0))
   	//4503599627370496p-52 11111111110000000000000000000000000000000000000000000000000000
   	fmt.Println(n, err) //84 &lt;nil&gt;
   	//no err
   	fmt.Printf(&quot;&#39;%[1]*.[2]*[3]f&#39;\n&quot;, 12, 4, 1234.1234) //&#39;   1234.1234&#39;
   
   }

Conclusion:  
 `%b` for `float64` shows only **significand**, done.

</details>



huangapple
  • 本文由 发表于 2016年4月11日 07:53:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/36537806.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定