英文:
How is c++ automatically deciding that a variable is being stored as subnormal in this program?
问题
I am learning c++ and I am really trying to get a handle on how numbers are stored and manipulated in memory. To help test my understanding, I wrote the following program:
#include <iostream>
#include <cmath>
int main() {
int answer = 200*300*400*500;
std::cout << "int result: " << answer << std::endl;
std::cout << "Size of int result: " << sizeof(answer) << std::endl;
std::cout << "Is normal? " << isnormal(answer) << std::endl;
float f_answer = 200.0*300*400*500;
std::cout << "float result: " << f_answer << std::endl;
std::cout << "Size of float result: " << sizeof(f_answer) << std::endl;
std::cout << "Is normal? " << isnormal(f_answer) << std::endl;
float under = 0.0000000005 * 0.0000000001 * 0.0000000001 * 0.0000000001 * 0.00001;
std::cout << "under result: " << under << std::endl;
std::cout << "Is normal?: " << isnormal(under) << std::endl;
return 0;
}
Based on everything that I have been learning about IEEE I was expecting the last block to return an underflow value (or 0) because it was my understanding that the most precise that 32-bit floats can be is around e^-38.
To my surprise, I was able to get it all the way down to e^-45! This made me start to look into normal vs. subnormal. So, awesome...something, somewhere is figuring out that I want more precision and deciding that this variable should be stored as subnormal. My question is: how?? What is doing this? Is this c++ itself?? Is it the compiler? (I'm using clang, to the best of my knowledge.) I thought the major benefit of using a language like this is that I can be pretty darn sure of what it's doing when I tell it to save as a float. My apologies if this is an ill-formed question; this is truly my first foray into lower-level languages, and any clarification that folks can provide would be greatly appreciated!
英文:
I am learning c++ and I am really trying to get a handle on how numbers are stored and manipulated in memory. To help test my understanding, I wrote the following program:
#include <iostream>
#include <cmath>
int main () {
int answer = 200*300*400*500;
std::cout << "int result: " << answer << std::endl;
std::cout << "Size of int result: " << sizeof(answer) << std::endl;
std::cout << "Is normal? " << isnormal(answer) << std::endl;
float f_answer = 200.0*300*400*500;
std::cout << "float result: " << f_answer << std::endl;
std::cout << "Size of float result: " << sizeof(f_answer) << std::endl;
std::cout << "Is normal? " << isnormal(f_answer) << std::endl;
float under = 0.0000000005 * 0.0000000001 * 0.0000000001 * 0.0000000001 * 0.00001;
std::cout << "under result: " << under << std::endl;
std::cout << "Is normal?: " << isnormal(under) << std::endl;
return 0;
}
Based on everything that I have been learning about IEEE I was expecting the last block to return an underflow value (or 0) because it was my understanding that the most precise that 32bit floats can be is around e^-38.
To my surprise, I was able to get it all the way down to e^-45! This made me start to look into normal vs. subnormal. So, awesome...something, somewhere is figuring out that I want more precision and deciding that this variable should be stored as subnormal. My question is: how?? What is doing this? Is this c++ itself?? Is it the compiler? (I'm using clang, to the best of my knowledge.) I thought the major benefit of using a language like this is that I can be pretty damn sure of what it's doing when I tell it to save as a float. My apologies if this is an ill-formed question, this is truly my first foray into lower level languages and any clarification that folks can provide would be greatly appreciated!
答案1
得分: 1
一个非零的数字如果其幅度低于浮点格式的正常范围,则被称为亚正常。
用于float
的常见格式是IEEE-754二进制32位,也称为单精度。在这个格式中,有限数被表示为±2e*f,其中e是一个满足-126 ≤ e ≤ 127的整数,f是由形式为d.ddddddddddddddddddddddd2的24位二进制数字表示的数字,其中每个d代表一个位。
对于所有正常数,第一个d是1,因此数字的形式是1.ddddddddddddddddddddddd2,并且f满足1 ≤ f < 2。最小的正常数是+2-126•1.000000000000000000000002,约为1.1755•10-38。最大可表示的有限数是+2127•1.111111111111111111111112,约为3.40282•1038。
对于亚正常数,第一个d是0,e为-126。最大的亚正常数是+2-126•0.111111111111111111111112,因此略小于最小正常数。最小的亚正常数是+2-126•0.000000000000000000000012,约为1.40130•10-45。
要在32位中编码浮点表示,第一个位对于+为0,对于-为1。对于正常数,符号位后的八位是e+127的二进制表示,因此它们是1到254之间的值。剩下的23位是“.”后的23个d位。当指数码为1到254时,f的第一个位已知为1。
对于亚正常数,符号位后的八位是0,剩下的23位是“.”后的23个d位。当指数码为0时,f的第一个位已知为0。
指数码255用于表示无穷大和NaN(意为非数字)。
英文:
A number, other than zero, is subnormal if its magnitude is below the normal range of the floating-point format.
The format commonly used for float
is IEEE-754 binary32, also called single-precision. In this format, finite numbers are represented as ±2<sup>e</sup>*f, where e is an integer satisfying −126 ≤ e ≤ 127 and f is the number represented by a 24-bit binary numeral of the form d.ddddddddddddddddddddddd<sub>2</sub>, where each d represents a bit.
For all the normal numbers, the first d is 1, so the numeral has the form 1.ddddddddddddddddddddddd<sub>2</sub>, and f satisfies 1 ≤ f < 2. The smallest normal number is +2<sup>−126</sup>•1.00000000000000000000000<sub>2</sub>, which is approximately 1.1755•10<sup>−38</sup>. The largest representable finite number is +2<sup>127</sup>•1.11111111111111111111111<sub>2</sub>, which is approximately 3.40282•10<sup>38</sup>.
For the subnormal numbers, the first d is 0, and e is −126. The largest subnormal number is +2<sup>−126</sup>•0.11111111111111111111111<sub>2</sub>, so it is slightly less than the smallest normal number. The smallest subnormal number is +2<sup>−126</sup>•0.00000000000000000000001<sub>2</sub>, which is approximately 1.40130•10<sup>−45</sup>.
To encode floating-point representations in 32 bits, the first bit is 0 for + and 1 for −. For normal numbers, the next eight bits are the binary for e+127, so they are a value from 1 to 254, inclusive. The remaining 23 bits are the 23 d bits after the “.”. The first bit for f is known to be 1 when the exponent code is 1 to 254.
For subnormal numbers, the eight bits after the sign bit are 0, and the remaining 23 bits are the 23 d bits after the “.”. The first bit for f is known to be 0 when the exponent code is 0.
The exponent code 255 is used to represent infinities and NaNs (meaning Not a Number).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论