C++如何自动确定在这个程序中一个变量是被存储为次正规(subnormal)的?

huangapple go评论64阅读模式
英文:

How is c++ automatically deciding that a variable is being stored as subnormal in this program?

问题

I am learning c++ and I am really trying to get a handle on how numbers are stored and manipulated in memory. To help test my understanding, I wrote the following program:

#include <iostream>
#include <cmath>

int main() {
    int answer = 200*300*400*500;

    std::cout << "int result: " << answer << std::endl;
    std::cout << "Size of int result: " << sizeof(answer) << std::endl;
    std::cout << "Is normal? " << isnormal(answer) << std::endl;

    float f_answer = 200.0*300*400*500;

    std::cout << "float result: " << f_answer << std::endl;
    std::cout << "Size of float result: " << sizeof(f_answer) << std::endl;
    std::cout << "Is normal? " << isnormal(f_answer) << std::endl;

    float under = 0.0000000005 * 0.0000000001 * 0.0000000001 * 0.0000000001 * 0.00001;
    std::cout << "under result: " << under << std::endl;
    std::cout << "Is normal?: " << isnormal(under) << std::endl;

    return 0;
}

Based on everything that I have been learning about IEEE I was expecting the last block to return an underflow value (or 0) because it was my understanding that the most precise that 32-bit floats can be is around e^-38.

To my surprise, I was able to get it all the way down to e^-45! This made me start to look into normal vs. subnormal. So, awesome...something, somewhere is figuring out that I want more precision and deciding that this variable should be stored as subnormal. My question is: how?? What is doing this? Is this c++ itself?? Is it the compiler? (I'm using clang, to the best of my knowledge.) I thought the major benefit of using a language like this is that I can be pretty darn sure of what it's doing when I tell it to save as a float. My apologies if this is an ill-formed question; this is truly my first foray into lower-level languages, and any clarification that folks can provide would be greatly appreciated!

英文:

I am learning c++ and I am really trying to get a handle on how numbers are stored and manipulated in memory. To help test my understanding, I wrote the following program:

#include &lt;iostream&gt;
#include &lt;cmath&gt;

int main () {
    int answer = 200*300*400*500;

    std::cout &lt;&lt; &quot;int result: &quot; &lt;&lt; answer &lt;&lt; std::endl;
    std::cout &lt;&lt; &quot;Size of int result: &quot; &lt;&lt; sizeof(answer) &lt;&lt; std::endl;
    std::cout &lt;&lt; &quot;Is normal? &quot; &lt;&lt; isnormal(answer) &lt;&lt; std::endl;

    float f_answer = 200.0*300*400*500;

    std::cout &lt;&lt; &quot;float result: &quot; &lt;&lt; f_answer &lt;&lt; std::endl;
    std::cout &lt;&lt; &quot;Size of float result: &quot; &lt;&lt; sizeof(f_answer) &lt;&lt; std::endl;
    std::cout &lt;&lt; &quot;Is normal? &quot; &lt;&lt; isnormal(f_answer) &lt;&lt; std::endl;

    float under = 0.0000000005 * 0.0000000001 * 0.0000000001 * 0.0000000001 * 0.00001;
    std::cout &lt;&lt; &quot;under result: &quot; &lt;&lt; under &lt;&lt; std::endl;
    std::cout &lt;&lt; &quot;Is normal?: &quot; &lt;&lt; isnormal(under) &lt;&lt; std::endl;

    return 0;
}

Based on everything that I have been learning about IEEE I was expecting the last block to return an underflow value (or 0) because it was my understanding that the most precise that 32bit floats can be is around e^-38.

To my surprise, I was able to get it all the way down to e^-45! This made me start to look into normal vs. subnormal. So, awesome...something, somewhere is figuring out that I want more precision and deciding that this variable should be stored as subnormal. My question is: how?? What is doing this? Is this c++ itself?? Is it the compiler? (I'm using clang, to the best of my knowledge.) I thought the major benefit of using a language like this is that I can be pretty damn sure of what it's doing when I tell it to save as a float. My apologies if this is an ill-formed question, this is truly my first foray into lower level languages and any clarification that folks can provide would be greatly appreciated!

答案1

得分: 1

一个非零的数字如果其幅度低于浮点格式的正常范围,则被称为亚正常。

用于float的常见格式是IEEE-754二进制32位,也称为单精度。在这个格式中,有限数被表示为±2e*f,其中e是一个满足-126 ≤ e ≤ 127的整数,f是由形式为d.ddddddddddddddddddddddd2的24位二进制数字表示的数字,其中每个d代表一个位。

对于所有正常数,第一个d是1,因此数字的形式是1.ddddddddddddddddddddddd2,并且f满足1 ≤ f < 2。最小的正常数是+2-126•1.000000000000000000000002,约为1.1755•10-38。最大可表示的有限数是+2127•1.111111111111111111111112,约为3.40282•1038

对于亚正常数,第一个d是0,e为-126。最大的亚正常数是+2-126•0.111111111111111111111112,因此略小于最小正常数。最小的亚正常数是+2-126•0.000000000000000000000012,约为1.40130•10-45

要在32位中编码浮点表示,第一个位对于+为0,对于-为1。对于正常数,符号位后的八位是e+127的二进制表示,因此它们是1到254之间的值。剩下的23位是“.”后的23个d位。当指数码为1到254时,f的第一个位已知为1。

对于亚正常数,符号位后的八位是0,剩下的23位是“.”后的23个d位。当指数码为0时,f的第一个位已知为0。

指数码255用于表示无穷大和NaN(意为非数字)。

英文:

A number, other than zero, is subnormal if its magnitude is below the normal range of the floating-point format.

The format commonly used for float is IEEE-754 binary32, also called single-precision. In this format, finite numbers are represented as ±2<sup>e</sup>*f, where e is an integer satisfying −126 ≤ e ≤ 127 and f is the number represented by a 24-bit binary numeral of the form d.ddddddddddddddddddddddd<sub>2</sub>, where each d represents a bit.

For all the normal numbers, the first d is 1, so the numeral has the form 1.ddddddddddddddddddddddd<sub>2</sub>, and f satisfies 1 ≤ f < 2. The smallest normal number is +2<sup>−126</sup>•1.00000000000000000000000<sub>2</sub>, which is approximately 1.1755•10<sup>−38</sup>. The largest representable finite number is +2<sup>127</sup>•1.11111111111111111111111<sub>2</sub>, which is approximately 3.40282•10<sup>38</sup>.

For the subnormal numbers, the first d is 0, and e is −126. The largest subnormal number is +2<sup>−126</sup>•0.11111111111111111111111<sub>2</sub>, so it is slightly less than the smallest normal number. The smallest subnormal number is +2<sup>−126</sup>•0.00000000000000000000001<sub>2</sub>, which is approximately 1.40130•10<sup>−45</sup>.

To encode floating-point representations in 32 bits, the first bit is 0 for + and 1 for −. For normal numbers, the next eight bits are the binary for e+127, so they are a value from 1 to 254, inclusive. The remaining 23 bits are the 23 d bits after the “.”. The first bit for f is known to be 1 when the exponent code is 1 to 254.

For subnormal numbers, the eight bits after the sign bit are 0, and the remaining 23 bits are the 23 d bits after the “.”. The first bit for f is known to be 0 when the exponent code is 0.

The exponent code 255 is used to represent infinities and NaNs (meaning Not a Number).

huangapple
  • 本文由 发表于 2023年4月4日 09:26:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/75924829.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定