英文:
Flexibly set floating point number precision at compile time
问题
以下是你要翻译的内容:
I have a C++ program that can be compiled for single or double precision floating point numbers. Similar as explained here (https://stackoverflow.com/questions/14511910/switching-between-float-and-double-precision-at-compile-time), I have a header file which defines:
typedef double dtype
or:
typedef float dtype
depending on whether single or double precision is required by the user. When declaring variables and arrays I always use the data type dtype
, so the correct precision is used throughout the code.
My question is how can I, in a similar fashion, set the data type of hard-coded numbers in the code, like for instance in this example:
dtype var1 = min(var0, 3.65)
As far as I know, 3.65 is by default double precision and will be single precision if I write:
dtype var1 = min(var0, 3.65f)
But is there a way to define a literal, for instance like this:
dtype var1 = min(var0, 3.65_dt)
that can either be defined as float or double at compile time to ensure that also hard-coded numbers in the code will have the right precision?
Currently, I cast the number to dtype
like this:
dtype var1 = min(var0, (dtype)3.65)
but I was concerned that this might create overhead in the case of single precision since the program might actually create a double precision number which is then cast to a single precision number. Is this indeed the case?
英文:
I have a C++ program that can be compiled for single or double precision floating point numbers. Similar as explained here (https://stackoverflow.com/questions/14511910/switching-between-float-and-double-precision-at-compile-time), I have a header file which defines:
typedef double dtype
or:
typedef float dtype
depending on whether single or double precision is required by the user. When declaring variables and arrays I always use the data type dtype
, so the correct precision is used throughout the code.
My question is how can I, in a similar fashion, set the data type of hard-coded numbers in the code, like for instance in this example:
dtype var1 = min(var0, 3.65)
As far as I know, 3.65 is by default double precision and will be single precision if I write:
dtype var1 = min(var0, 3.65f)
But is there a way to define a literal, for instance like this:
dtype var1 = min(var0, 3.65_dt)
that can either be defined as float or double at compile time to ensure that also hard-coded numbers in the code will have the right precision?
Currently, I cast the number to dtype
like this:
dtype var1 = min(var0, (dtype)3.65)
but I was concerned that this might create overhead in the case of single precision since the program might actually create a double precision number which is then cast to a single precision number. Is this indeed the case?
答案1
得分: 1
你可以通过一个宏来实现这个,为 float
添加 f
后缀,例如 #define foo(x) x##f
,对于 double
则不添加,如 #define foo(x) x
。
虽然你也可以使用强制类型转换或各种引发的转换将常量转换为 float
值,但这会创建一个双重舍入过程:源文本中的字面值首先被转换为 double
,然后再转换为 float
。大约在 2^29 次中的一次,这会产生与将字面值直接转换为 float
不同的结果。
(2^29 之所以出现是因为通常用于 float
和 double
的格式的尾数位数不同,分别为 24 和 53。这假设了在表示中的比特模式具有均匀分布。实际数据可能有不同的分布。)
英文:
You can do this with a macro that appends an f
suffix for float
, as with #define foo(x) x##f
, and does not for double
, as with #define foo(x) x
.
While you can also coerce constants to become float
values with casts or various induced conversions, this creates a double-rounding process: The literal in source text is first converted to double
and then converted to float
. In about one instance in 2<sup>29</sup>, this produces a different result than if the literal is directly converted to float
.
(2<sup>29</sup> is due to the difference in the numbers of bits in the significands of the formats commonly used for float
and double
, 24 and 53. This assumes a uniform distribution for the bit patterns in the representation. Practical data may have a different distribution.)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论