英文:
In C, how do you print a float/double as a string and read it back as the same float?
问题
我想知道实现这一目标的最简单、最便携和通常被认为是最佳实践的方法,适用于任何数字。我还希望与数字相关联的字符串以十进制表示,并且如果可能的话不要使用科学计数法。
英文:
I would like to know the easiest, most portable and generally considered best practice to achieve this, that works for any number. I also would like the string associated with the number to be in decimal representation, without scientific notation if possible.
答案1
得分: 10
这是你提供的代码的翻译部分:
有两个问题:
- 你需要什么格式?
- 你需要多少位有效数字?
你说你希望尽可能避免科学计数法,这没问题,但是打印类似于0.00000000000000000123或12300000000000000000这样的数字就有点不合理,所以对于非常大或非常小的数字,你可能需要切换到科学计数法。
碰巧有一个可以做到这一点的printf格式:%g。它会尽量使用%f,但如果必须的话会切换到%e。
然后还有关于数字位数的问题。你需要足够的位数来保留float或double值的内部精度。简而言之,你想要的位数是预定义的常量FLT_DECIMAL_DIG或DBL_DECIMAL_DIG。
因此,将这一切结合起来,你可以这样将一个float转换为字符串:
sprintf(str, "%.*g", FLT_DECIMAL_DIG, f);
对于double,技术完全类似:
sprintf(str, "%.*g", DBL_DECIMAL_DIG, d);
在这两种情况下,我们使用了一种间接的技巧来告诉%g我们需要多少位有效数字。我们可以使用%g来让它自行选择,或者使用类似于%.10g的东西来请求10位有效数字,但在这里我们使用了%.g,其中表示使用传入的参数来指定有效数字的数量。这使我们可以精确地插入来自<float.h>的FLT_DECIMAL_DIG或DBL_DECIMAL_DIG的确切值。
还有一个关于你可能需要多大字符串的问题。关于这一点稍后再说。
然后,你可以使用atof、strtod或sscanf将字符串转换回float或double:
f = atof(str);
d = strtod(str, &str2);
sscanf(str, "%g", &f);
sscanf(str, "%lg", &d);
(顺便说一下,scanf和类似的函数不太关心格式,你可以使用%e、%f或%g,它们都会完全相同地工作。)
这是一个将所有这些内容综合在一起的演示程序:
#include <stdio.h>
#include <stdlib.h>
#include <float.h>
int main()
{
double d1, d2;
char str[DBL_DECIMAL_DIG + 10];
while (1)
{
printf("Enter a floating-point number: ");
fflush(stdout);
if (scanf("%lf", &d1) != 1)
{
printf("okay, we're done\n");
break;
}
printf("you entered: %g\n", d1);
snprintf(str, sizeof(str), "%.*g", DBL_DECIMAL_DIG, d1);
printf("converted to string: %s\n", str);
d2 = strtod(str, NULL);
printf("converted back to double: %g\n", d2);
if (d1 != d2)
printf("whoops, they don't match!\n");
printf("\n");
}
}
这个程序提示输入一个double值d1,将其转换为字符串,然后将其转换回double d2,并检查这两个值是否匹配。有几点需要注意:
- 代码为转换后的字符串选择了一个大小为char str[DBL_DECIMAL_DIG + 10]的缓冲区。这应该总是足够容纳数字、符号、指数和终止的'\0'。
- 代码使用了(强烈建议的)替代函数snprintf而不是sprintf,这样可以传递目标缓冲区的大小,以确保它不会在某些情况下溢出,毕竟它可能不够大。
- 在这里,我们想要比较的有效数字的数量 - 我们在%g中使用的精度,例如在%.10g或%.*g中指定的精度 - 是有效数字的数量。它不仅仅是小数点后的位数。例如,数字1234000、12.34、0.1234和0.00001234都有四个有效数字。
在上面我说过,“简而言之,你想要的位数是预定义的常量FLT_DECIMAL_DIG或DBL_DECIMAL_DIG。”这些常量是确保二进制到十进制到二进制转换的精确往返所需的最小有效数字数量。这显然是我们在这里需要的。还有另一对看似类似的常量,FLT_DIG和DBL_DIG,它们给出了在将外部的十进制(字符串)表示转换为内部浮点值,然后再转换回十进制时所需的最小位数。对于典型的IEEE-754实现,FLT_DIG/DBL_DIG分别为6和15,而FLT_DECIMAL_DIG/DBL_DECIMAL_DIG分别为9和17。关于这一点,可以参考这个SO答案。
FLT_DECIMAL_DIG和DBL_DECIMAL_DIG是确保进行往返的二进制到十进制到二进制转换所需的最小有效数字数量,但它们不一定足以准确显示实际的内部二进制值。对于这些情况,您可能需要与尾数中的二进制位数一样多的十进制位数。例如,如果我们从十进制数字123.456开始,将其转换为float,我们得到的结果大致为123.45600128...。如果我们使用FLT_DECIMAL_DIG或9位有效数字进行打印,我们得到123.456001,它会转换回123.45600128...,所以我们成功了。但实际的内部值是16进制中的7b.74bc8,二进制中的1111011.01110100101111001,有24位有效位。这些数字的实
英文:
There are two questions:
- What format do you need, and
- How many significant digits do you need?
You said you wanted to avoid scientific notation if possible, and that's fine, but printing numbers like 0.00000000000000000123 or 12300000000000000000 gets kind of unreasonable, so you might want to switch to scientific notation for really big or really small numbers.
As it happens, there's a printf format that does exactly that: %g
. It acts like %f
if it can, but switches to %e
if it has to.
And then there's the question of the number of digits. You need enough digits to preserve the internal precision of the float
or double
value. To make a long story short, the number of digits you want is the predefined constant FLT_DECIMAL_DIG
or DBL_DECIMAL_DIG
.
So, putting this all together, you can convert a float
to a string like this:
sprintf(str, "%.*g", FLT_DECIMAL_DIG, f);
The technique for a double
is perfectly analogous:
sprintf(str, "%.*g", DBL_DECIMAL_DIG, d);
In both cases, we use an indirect technique to tell %g
how many significant digits we want. We could have used %g
to let it pick, or we could have used something like %.10g
to request 10 significant digits, but here we use %.*g
, where the *
says to use a passed-in argument to specify the number of significant digits. This lets us plug in the exact value FLT_DECIMAL_DIG
or DBL_DECIMAL_DIG
from <float.h>
.
(There's also the question of how big a string you might need. More on this below.)
And then you can convert back from a string to a float
or double
using atof
, strtod
, or sscanf
:
f = atof(str);
d = strtod(str, &str2);
sscanf(str, "%g", &f);
sscanf(str, "%lg", &d);
(By the way, scanf
and friends don't really care about the format so much — you could use %e
, %f
, or %g
, and they'd all work exactly the same.)
Here is a demonstration program tying all of this together:
#include <stdio.h>
#include <stdlib.h>
#include <float.h>
int main()
{
double d1, d2;
char str[DBL_DECIMAL_DIG + 10];
while(1) {
printf("Enter a floating-point number: ");
fflush(stdout);
if(scanf("%lf", &d1) != 1) {
printf("okay, we're done\n");
break;
}
printf("you entered: %g\n", d1);
snprintf(str, sizeof(str), "%.*g", DBL_DECIMAL_DIG, d1);
printf("converted to string: %s\n", str);
d2 = strtod(str, NULL);
printf("converted back to double: %g\n", d2);
if(d1 != d2)
printf("whoops, they don't match!\n");
printf("\n");
}
}
This program prompts for a double value d1
, converts it to a string, converts it back to a double d2
, and checks to make sure the values match. There are several things to note:
- The code picks a size
char str[DBL_DECIMAL_DIG + 10]
for the converted string. That should always be enough for the digits, a sign, an exponent, and the terminating '\0'. - The code uses the (highly recommended) alternative function
snprintf
instead ofsprintf
, so that the destination buffer size can be passed in, to make sure it doesn't overflow if by some mischance it's not big enough, after all. - You will hear it said that you should never compare floating-point numbers for exact equality, but this is a case where we want to! If after going around the barn,
d1
is not exactly equal tod2
, something has gone wrong. - Although this code checks to make sure that
d1 == d2
, it quietly glosses over the fact thatd1
might not have been exactly equal to the number you entered! Most real numbers (and most decimal fractions) cannot be represented exactly as a finite-precisionfloat
ordouble
value. If you enter a seemingly "simple" fraction like 0.1 or 123.456,d1
will not have exactly that value.d1
will be a very close approximation — and then, assuming everything else works correctly,d2
will end up containing exactly the same very close approximation. To see what's really going on here, you can increase the precision printed by the "you entered" and "converted back to double" lines. See also Is floating-point math broken? - The number of significant digits we care about here — the precision we give to
%g
when we say%.10g
or%.*g
— is a number of significant digits. It is not just a count of places past the decimal. For example, the numbers 1234000, 12.34, 0.1234, and 0.00001234 all have four significant digits.
Up above I said, "To make a long story short, the number of digits you want is the predefined constant FLT_DECIMAL_DIG
or DBL_DECIMAL_DIG
." These constants are literally the minimum number of significant digits required to take an internal floating-point value, convert it to a decimal (string) representation, convert it back to an internal floating-point value, and get exactly the same value back. That's obviously precisely what we want here. There's another, seemingly similar, pair of constants, FLT_DIG
and DBL_DIG
which give the minimum number of digits you're guaranteed to preserve if you convert from an external, decimal (string) representation, to an internal floating-point value, and then back to decimal. For typical IEEE-754 implementations, FLT_DIG
/DBL_DIG
are 6 and 15, while FLT_DECIMAL_DIG
/DBL_DECIMAL_DIG
are 9 and 17. See this SO answer for more on this.
FLT_DECIMAL_DIG
and DBL_DECIMAL_DIG
are the minimum number of digits necessary to guarantee a round-trip binary-to-decimal-to-binary conversion, but they are not necessarily enough to show precisely what the actual, internal, binary value is. For those you might need as many decimal digits as there are binary bits in the significand. For example, if we start with the decimal number 123.456, and convert it to float
, we get something like 123.45600128... . If we print it with FLT_DECIMAL_DIG
or 9 significant digits, we get 123.456001, and that converts back to 123.45600128..., so we've succeeded. But the actual internal value is 7b.74bc8
in base 16, or 1111011.01110100101111001
in binary, with 24 significant bits. The actual, full-precision decimal conversion of those numbers is 123.45600128173828125.
Addendum:
It must be noted that accurately transmitting floating-point values as decimal strings in this way does absolutely demand:
- A well-constructed floating-point-to-decimal-string converter (i.e.
sprintf
%g
). When converting N bits to M digits, they must always be M properly rounded digits. - Sufficient digits (
FLT_DECIMAL_DIG
orDBL_DECIMAL_DIG
, as discussed above). - A well-constructed decimal-string-to-floating-point converter (e.g.
strtod()
). When converting N digits to M bits, they must always be M properly rounded bits.
The IEEE-754 standard does require properties (1) and (3). But implementations not conforming to IEEE-754 might not do so well. (It turns out that property (1), in particular, is remarkably difficult to achieve, although techniques for doing so are now well understood.)
Addendum 2:
I have performed empirical tests using a modification of the above program, looping over many values, not just individual ones scanf'ed from the user.
In this "regression test" version, I have replaced the test
if(d1 != d2)
printf("whoops, they don't match!\n");
with
if(d1 != d2 && (!isnan(d1) || !(isnan(d1) && isnan(d2))))
printf("whoops, they don't match!\n");
(That is, when the numbers don't match, it's an error only if one of them is not a NaN.)
Anyway, I have tested all 4,294,967,296 values of type float
.
I have tested 100,000,000,000 randomly-selected values of type double
(which is, to be fair, a tiny fraction of them).
Not once (except for deliberately-induced errors, to test the tests) have I seen it print "whoops, they don't match!".
答案2
得分: 3
Sure, here's the translated text:
每个不是非常过时(因此有缺陷)的 printf()
/ scanf()
(/ strtod()
) 实现都应该能够在不丢失精度的情况下进行往返。但重要的是,您要比较浮点表示在往返前后的情况,而不是作为字符串打印的内容。只要明确地标识出二进制值,实现就可以打印二进制值的近似值(请注意,可能的十进制表示比二进制表示多得多)。
如果您对如何完成此操作的详细信息感兴趣,该算法称为 Dragon 4。关于这个主题的一个很好的介绍可以在这里找到。
如果您不太关心字符串的可读性,可以使用 %a
转换说明符。这会将浮点数的尾数以十六进制(带有十进制指数)打印/读取。这完全避免了二进制/十进制转换的需要。您也不需要担心指定应该打印多少位精度,因为默认情况下会打印精确值。
英文:
Every printf()
/ scanf()
(/ strtod()
) implementation that is not utterly outdated (and, thus, bugged) should be able to make the round-trip without loss of precision. It is important, though, that you compare the floating point representation pre and post roundtrip, not what is printed as a string. An implementation is perfectly allowed to print an approximation of the binary value, as long as it unambiguously identifies that binary value. (Note that there are many more possible decimal representations than binary ones.)
If you are interested in the details of how this is done, the algorithm is called Dragon 4. A nice introduction on the subject is available here.
If you don't care for readability of the string too much, go for the %a
conversion specifier. This prints / reads the float's mantissa as hexadecimal (with a decimal exponent). This avoids the binary / decimal conversion altogether. You also do not need to worry about specifying how many digits of precision should be printed, as the default is to print the precise value.
答案3
得分: 1
I would like to know the easiest, most portable and generally considered best practice to achieve this, that works for any number. I also would like the string associated with the number to be in decimal representation, without scientific notation if possible.
这是具有挑战性的,通常很难以一般方式做到这一点。需要考虑的一些不寻常因素包括:
-
浮点数具有多个编码表示相同值的情况:例如,使用2个“double”表示“long double”的双倍编码。该格式对于非规范配对还存在其他问题。
-
具有有效载荷的非数字。
-
负零。
-
具有填充的浮点数。示例。
无科学记数法
一些高质量的标准库可以在没有重大损失的情况下执行高精度文本转换。
double x = -DBL_TRUE_MIN;
#define PRECISION_NEED (DBL_DECIMAL_DIG - DBL_MIN_10_EXP - 1)
// sign 1 . fraction double x = -DBL_TRUE_MIN;
#define PRECISION_NEED (DBL_DECIMAL_DIG - DBL_MIN_10_EXP - 1)
// sign 1 . fraction \0
#define BUF_N (1 + 1 + 1 + PRECISION_NEED + 1)
char buf[BUF_N];
sprintf(buf, "%.f", PRECISION_NEED, x);
if (atof(buf) == x) ...
#define BUF_N (1 + 1 + 1 + PRECISION_NEED + 1)
char buf[BUF_N];
sprintf(buf, "%.f", PRECISION_NEED, x);
if (atof(buf) == x) ...
或者您可以自己编写,但这并不简单。
最佳实践
像许多人建议的那样,首先使用sprintf(large_enough_buffer, "%.g", DBL_DECIMAL_DIG, x)
。
英文:
> I would like to know the easiest, most portable and generally considered best practice to achieve this, that works for any number. I also would like the string associated with the number to be in decimal representation, without scientific notation if possible.
... works for any number
This is challenging to do well in general. Unusual considerations that need assessment include:
-
Floating point that have multiple encodings for a single value: e.g. double double encoding using 2
double
to represent along double
. This format also has additional issues for non-canonical parings. -
Not-a-numbers with a payload.
-
Floating point with padding. example.
Without scientific notation
Some quality standard libraries will perform high precision text conversions without insignificant loss.
double x = -DBL_TRUE_MIN;
#define PRECISION_NEED (DBL_DECIMAL_DIG - DBL_MIN_10_EXP - 1)
// sign 1 . fraction double x = -DBL_TRUE_MIN;
#define PRECISION_NEED (DBL_DECIMAL_DIG - DBL_MIN_10_EXP - 1)
// sign 1 . fraction \0
#define BUF_N (1 + 1 + 1 + PRECISION_NEED + 1)
char buf[BUF_N];
sprintf(buf, "%.f", PRECISION_NEED, x);
if (atof(buf) == x) ...
#define BUF_N (1 + 1 + 1 + PRECISION_NEED + 1)
char buf[BUF_N];
sprintf(buf, "%.f", PRECISION_NEED, x);
if (atof(buf) == x) ...
Or you can code it yourself, yet that is not simple.
Best practice
Use sprintf(large_enough_buffer, "%.g", DBL_DECIMAL_DIG, x)
as suggested by many as the first step.
答案4
得分: 0
以下是您提供的文本的翻译:
将浮点数转换为十进制数不是一个精确的过程(除非您使用非常长的字符串 - 请参见注释),反之亦然。如果重要的是读取回来的浮点数完全相同,按位,那么您需要保留二进制表示,可能作为下面所示的十六进制字符串。这会保留非数值值,如NAN和+-INF。十六进制字符串可以安全地写入内存或文件。
如果您需要它可读性强,那么您可以发明自己的字符串格式,其中两者都使用,例如通过在十进制字符串前面加上十六进制表示。然后,当数字转换回浮点数时,它将使用十六进制值,而不是十进制值,因此将与原始值完全相同。十六进制字符串只需要固定的8个字符,因此不太昂贵。正如其他人所指出的,如果您不想丢失精度,要预测printf浮点数或双精度数所需的缓冲区大小可能并不明显。请查看其他人的评论和答案,以了解如何打印人类可读的表示的选项和危险。
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <math.h>
/********************************************************************************************
// Floating Points As Hex Strings
// ==============================
// 作者:Simon Goater,2023年5月。
//
// 浮点数的二进制表示必须与源和目标浮点数相同。
// 如果源和目标的字节顺序不同,则必须相应地排列十六进制字符。
*/
typedef union {
float f;
double d;
long double ld;
unsigned char c[16];
} fpchar_t;
const unsigned char hexchar[16] = {0x30, 0x31, 0x32, 0x33,
0x34, 0x35, 0x36, 0x37,
0x38, 0x39, 0x41, 0x42,
0x43, 0x44, 0x45, 0x46};
const unsigned char binchar[23] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0,
0, 0, 0, 0, 0, 10, 11, 12, 13, 14, 15};
void fptostring(void* f, unsigned char* string, uint8_t sizeoffp) {
fpchar_t floatstring;
memcpy(&floatstring.c, f, sizeoffp);
int i, stringix;
stringix = 0;
unsigned char thischar;
for (i=0; i<sizeoffp; i++) {
thischar = floatstring.c[i];
string[stringix] = hexchar[thischar >> 4];
stringix++;
string[stringix] = hexchar[thischar & 0xf];
stringix++;
}
}
void stringtofp(void* f, unsigned char* string, uint8_t sizeoffp) {
fpchar_t floatstring;
int i, stringix;
stringix = 0;
for (i=0; i<sizeoffp; i++) {
floatstring.c[i] = binchar[(string[stringix] - 0x30) % 23] << 4;
stringix++;
floatstring.c[i] += binchar[(string[stringix] - 0x30) % 23];
stringix++;
}
memcpy(f, &floatstring.c, sizeoffp);
}
_Bool isfpstring(void* f, unsigned char* string, uint8_t sizeoffp) {
// 验证floatstring,如果正确,则将值复制到f。
int i;
for (i=0; i<2*sizeoffp; i++) {
if (string[i] < 0x30) return false;
if (string[i] > 0x46) return false;
if ((string[i] > 0x39) && (string[i] < 0x41)) return false;
}
stringtofp(f, string, sizeoffp);
return true;
}
/********************************************************************************************
// Floating Points As Hex Strings - END
// ====================================
*/
int main(int argc, char **argv)
{
//float f = 1.23f;
//double f = 1.23;
long double f = 1.23;
if (argc > 1) f = atof(argv[1]);
unsigned char floatstring[33] = {0};
//printf("fpval = %.32f\n", f);
printf("fpval = %.32Lf\n", f);
fptostring((void*)&f, (unsigned char*)floatstring, sizeof(f));
printf("floathex = %s\n", floatstring);
f = 1.23f;
//floatstring[0] = 'a';
if (isfpstring((void*)&f, (unsigned char*)floatstring, sizeof(f))) {
//printf("fpval = %.32f\n", f);
printf("fpval = %.32Lf\n", f);
} else {
printf("Error converting floating point from hex.\n");
}
exit(0);
}
请注意,我已经将代码中的注释翻译成了中文,但代码本身没有进行翻译。如果您需要更多帮助,请随时告诉我。
英文:
Converting floating point numbers to decimal is not an exact process (unless you use very long strings - see comments), nor is doing the converse. If it's important that the floating point numbers read back are exactly the same, bit for bit, then you need to preserve the binary representation, possibly as a hex string as shown below. This preserves non-numerical values like NAN and +-INF. The hex string can safely be written to memory or a file.
If you need it to be human readable, then you could invent your own string format which uses both, such as by prepending the decimal string with the hex representation for example. Then when the number is converted back to a float, it will use the hex value, not the decimal value and so will have exactly the same value as the original. The hex string only requires a fixed 8 characters so is not so expensive. As others have pointed out it can be non-obvious to predict the size of the buffer needed to printf a float or double, especially if you want no loss of precision. See other's comments and answers for options and hazzards on how to print a human readable representation.
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <math.h>
/********************************************************************************************
// Floating Points As Hex Strings
// ==============================
// Author: Simon Goater May 2023.
//
// The binary representation of floats must be same for source and destination floats.
// If the endianess of source and destination differ, the hex characters must be
// permuted accordingly.
*/
typedef union {
float f;
double d;
long double ld;
unsigned char c[16];
} fpchar_t;
const unsigned char hexchar[16] = {0x30, 0x31, 0x32, 0x33,
0x34, 0x35, 0x36, 0x37,
0x38, 0x39, 0x41, 0x42,
0x43, 0x44, 0x45, 0x46};
const unsigned char binchar[23] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0,
0, 0, 0, 0, 0, 10, 11, 12, 13, 14, 15};
void fptostring(void* f, unsigned char* string, uint8_t sizeoffp) {
fpchar_t floatstring;
memcpy(&floatstring.c, f, sizeoffp);
int i, stringix;
stringix = 0;
unsigned char thischar;
for (i=0; i<sizeoffp; i++) {
thischar = floatstring.c[i];
string[stringix] = hexchar[thischar >> 4];
stringix++;
string[stringix] = hexchar[thischar & 0xf];
stringix++;
}
}
void stringtofp(void* f, unsigned char* string, uint8_t sizeoffp) {
fpchar_t floatstring;
int i, stringix;
stringix = 0;
for (i=0; i<sizeoffp; i++) {
floatstring.c[i] = binchar[(string[stringix] - 0x30) % 23] << 4;
stringix++;
floatstring.c[i] += binchar[(string[stringix] - 0x30) % 23];
stringix++;
}
memcpy(f, &floatstring.c, sizeoffp);
}
_Bool isfpstring(void* f, unsigned char* string, uint8_t sizeoffp) {
// Validates the floatstring and if ok, copies value to f.
int i;
for (i=0; i<2*sizeoffp; i++) {
if (string[i] < 0x30) return false;
if (string[i] > 0x46) return false;
if ((string[i] > 0x39) && (string[i] < 0x41)) return false;
}
stringtofp(f, string, sizeoffp);
return true;
}
/********************************************************************************************
// Floating Points As Hex Strings - END
// ====================================
*/
int main(int argc, char **argv)
{
//float f = 1.23f;
//double f = 1.23;
long double f = 1.23;
if (argc > 1) f = atof(argv[1]);
unsigned char floatstring[33] = {0};
//printf("fpval = %.32f\n", f);
printf("fpval = %.32Lf\n", f);
fptostring((void*)&f, (unsigned char*)floatstring, sizeof(f));
printf("floathex = %s\n", floatstring);
f = 1.23f;
//floatstring[0] = 'a';
if (isfpstring((void*)&f, (unsigned char*)floatstring, sizeof(f))) {
//printf("fpval = %.32f\n", f);
printf("fpval = %.32Lf\n", f);
} else {
printf("Error converting floating point from hex.\n");
}
exit(0);
}
答案5
得分: 0
作为一般规则,这可能无法实现,因为在解码过程中,两种不同的实现可能会导致不同的浮点值。原因是将相同的数字表示为十进制ASCII数字和数字的二进制表示不可能是一对一的应用。有时候十进制浮点数(例如0.1)没有二进制数的有限表示(0.1十进制转换为0.00011001100110011001100110011001100……二进制),无法表示为有限位序列(例如当我们将1.0除以3.0时,得到无限序列0.333333333333...)
将有限的二进制数转换为十进制总是可能的...每个有限的浮点数(没有无限数字表示的那种)总是会得到一个有限的(尽管可能很大)字符串。这意味着有更多的有限十进制字符串表示比任何有限二进制表示。基于这一点,我们将始终有一个多对一的应用,导致一些有限的十进制表示数字映射到相同的二进制图像。
如果我们考虑从二进制到十进制的对应关系是单射的事实,总是可以将二进制映射回原始的十进制表示(我们正在处理有限集合,所以至少可以逐案例处理)。例如,将所有最接近映射数字的数字映射回相同的数字。但还有另一个阻碍,阻止了构建映射。任意长度的二进制字符串的映射总是映射到具有有限长度的十进制字符串的映射...但以完整的十进制精度存储完整二进制数字所需的位数约为每个二进制数字的一个完整十进制有效数字,因此
0.1(bin) --> 0.5(dec)(每个都有一个数字)
而
0.0001(bin) --> 0.0625(dec)(小数点后四位)
1.0 * -2^32 --> 0.00000000023283064365386962890625(小数点后23位有效数字)
并且继续增加。在保持有界计算的情况下(在十进制和二进制数系统中都是如此)和四舍五入可能会导致某些数字四舍五入到最接近的小数点(使用十进制四舍五入),但在读回计算机后,最接近的(这次使用二进制四舍五入或上面描述的最接近方法之一)可能是原始数字的下一个或前一个数字,并且在保存后检索的数字与原始数字之间存在差异。
但是...您可以考虑以ASCII二进制形式保存数字。
这样,您将确保存储的数字与原始数字完全相同(因为在两个过程中都是在相同的数字基数中进行四舍五入,使通信是一对一的)。进行这种转换应该很容易,因此您将获得浮点二进制数的可移植和精确序列化。这可以以有界和精确的方式完成,因此您永远不会出现四舍五入误差,并且将保证您的数据成功保存和随后恢复。
在今天的体系结构中,内部二进制浮点表示的标准是IEEE-754广泛使用的。因此,一种简单的映射方法是从包含符号的字节开始的十六进制字节表示,到尾数的LSB比特,这是一个很好且高效的起点。另一个好的转换是使用大端IEEE-754的二进制表示的base64编码(如上所述),它允许您将任何double
数字(包括NaN和无穷大)编码为11个ASCII字符,或将float
编码为5个ASCII字符,从而实现跨体系结构的编码。
英文:
As a general rule, this can be impossible to achieve, as in the decoding process, two different implementations can result in different floating point values. The reason for this is that, expressing the same number as a decimal ASCII number and internally as a binary representation of the number is not possible as a biyective application. Some times a decimal floating point number (e.g. 0.1) has no finite representation as a binary number (0.1 decimal converts into 0.00011001100110011001100110011001100... binary) and cannot be represented as a finite bit sequence (like when we divide 1.0 by 3.0, we get the infinite sequence 0.333333333333...)
converting a finite, binary number to decimal is always posible... every finite floating point number (one that has no infinite number representation) always results in a finite (although it can be very large) string. This means that there are more decimal string representations of finite decimal numbers than any finite binary representation. based on this, always we'll have a many to one application that results in some decimal finite representation numbers being mapped to the same binary image.
This could be handled by the by the implementation, if we consider the fact that the correspondence from binary to decimal is inyective, and always results in a binary being able to be converted, we can build an inverse that maps that found representation into the original one (we are dealing with finite sets, so, at least, we can do it case by case) For example, the representation that maps all the numbers closest to the mapped number to be converted back to that same number. But there's another drawback, that impedes to build the mapping. The mapping of arbitrary, finite lentgh, binary string, always maps into a mapping, finite length, decimmal string... but the amount of digits necesary to store a full binary digit with full decimal precision requires around one full dedimal significative digit per binary digit in the binary representation, so while
0.1(bin) --> 0.5(dec) (one digit each)
while
0.0001(bin) --> 0.0625(dec) (four digits after the decimal point)
1.0 * -2^32 --> 0.00000000023283064365386962890625 (23 significative digits after the decimal point)
and growing. Maintaining a bounded computation (in both, decimal and binary number systems) and rounding can make that some number rounds to the nearest decimal point (using decimal rounding), but when reading back the number to the computer, the closest (this time using binaray rounding or the closest approach described above) be the next or the previous number to the original one, and make a difference between the original number and the one retrieved after being saved.
But... you can consider saving a number in ascii binary form.
This way, you will warrant that the stored number will be exactly the same as the original one (why, because in both processes the rounding is made in the same numbering base, making biyective the correspondence). It should be easy to make such a conversion, so you will get a portable and exact serialization of floating point binary numbers. This can be done in a bounded and exact way, so you will never incurr in rounding errors, and will warant that your data is succesfully saved and later restored.
In today's architectures, the standard for internal binary floating point representation is IEEE-754 is widely used. So a simple mapping like taking the byte representations in hexadecimal starting from the sign holding byte, to the LSB bit of the significand is a good and efficient starting point. Another good convertion is to use base64 encoding of the binary representation in big endian IEEE-754 (as described above) that allows you to encode in an architecture independent any double
number (including NaNs and Infinites) into 11 ASCII characters, or a float
into 5 ASCII characters.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论