How to intrepret paragraph 1 of section 6.3.1.4 of C11 standard (about converting float to unsigned int)

huangapple go评论67阅读模式
英文:

How to intrepret paragraph 1 of section 6.3.1.4 of C11 standard (about converting float to unsigned int)

问题

我的C11标准来自这里。这段话说:

> 当将实型的有限值转换为除**_Bool**以外的整型时,会丢弃小数部分(即,值朝零截断)。如果整数部分的值无法被整型表示,行为是未定义的[61]。

而脚注61说:

> 当将整数类型的值转换为无符号类型时,无需执行取余操作,当将实型的值转换为无符号类型时,不需要执行取余操作。因此,可移植实型值的范围为(-1,U type _MAX+1)

我主要困惑于unsigned int。我当前的理解如下:

float    a = 3.14;
uint32_t b = (uint32_t)a; // 定义良好,b == 3

float    a = -1.23;
uint32_t b = (uint32_t)a; // 未定义行为(UB)!

float a = 2147483646.0;   // 定义良好
uint32_t b = (uint32_t)a; // 定义良好,b == 2147483646
uint8_t  c = (uint8_t )a;  // 未定义行为(UB)!

这样理解正确吗?

英文:

My C11 standard is from here. This paragraph says:

> When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.[61]

and footnote 61 says:

> The remaindering operation performed when a value of integer type is converted to unsigned type need not be performed when a value of real floating type is converted to unsigned type. Thus, the range of portable real floating values is (−1, U type _MAX+1)

My confusion is mainly about unsigned int. My current understanding is the following:

float    a = 3.14;
uint32_t b = (uint32_t)a; // defined, b == 3

float    a = -1.23;
uint32_t b = (uint32_t)a; // UB!

float a = 2147483646.0;   // defined
uint32_t b = (uint32_t)a; // defined, b == 2147483646
uint8_t  c = (uint8_t )a;  // UB!

Is this correct?

答案1

得分: 1

脚注 61 阐明了可以将浮点数转换为无符号整数类型而不会出现未定义行为的范围。

无符号整数类型可以表示范围内的值 [0; Utype_MAX]。因此,具有该区间内整数部分的任何浮点值可以转换为该无符号整数类型,这意味着值 x,其中 x > -1x < Utype_MAX+1。这是脚注 61 最后部分的陈述。

一般规则是,当对无符号整数进行操作导致结果超出范围 [0; Utype_MAX] 时,结果会取模 Utype_MAX+1(也称为"环绕")。例如,当将两个 16 位整数相加时,40000+40000=80000,这在 16 位中无法表示,结果会减少模 65536 为 14464。

然而,当将浮点数转换为无符号整数时,不需要执行这种环绕。这是脚注 61 的第一句陈述。

英文:

Footnote 61 clarifies the range of floating-point number that can be casted to an unsigned integer type without undefined behavior.

The unsigned integer type can represent value in the range [0; Utype_MAX]. Hence any floating-point value with integer part in this interval can be casted to that unsigned integer type which means values x where x &gt; -1 and x &lt; Utype_MAX+1. This is the statement of the last part of footnote 61.

The general rule is that when operations on unsigned integers result in a number outside the range [0; Utype_MAX], then the result is reduced module Utype_MAX+1 (also referred to as "wrap-around"). E.g. when adding two 16-bit integers, 40000+40000=80000 which is not representable in 16 bit, the result is reduced module 65536 to 14464.

However, this wrap-around does not need to be done when casting a floating-point number to an unsigned integer. This is the first statement in footnote 61.

答案2

得分: 1

这是你提供的内容的翻译:

你的问题确切是这样的:

> 声明 uint8_t a = (uint8_t)123456; 是允许的,因为会发生环绕,但 uint8_t a = (uint8_t)123456.7 是未定义行为,因为 C 标准不要求环绕。标准是否是这么说的?

标准文本似乎对此很明确,脚注也确认了整数转换所定义的模运算不一定会发生在浮点转换中。

这个文本已经出现在 C 标准的 C99 版本中(带有不同的脚注编号),并且也出现在 C90 版本(也称为 ANSI C)中,没有提到 _Bool 类型。

C 标准中出现的这种明显的语义不一致可能是因为担心保持现有的实现和硬件行为与标准兼容。这可能与负浮点数的二进制表示有关:虽然除了一些古老的体系结构外,很长一段时间以来都使用二进制补码表示有符号整数(实际上是由最新的 C23 标准强制规定的),但浮点数通常使用符号 + 幅度表示。有符号整数到无符号整数转换的模运算语义在二进制补码表示上不需要额外的成本,但在某些硬件实现中可能需要额外的硅芯片来处理浮点值,而在当时的所有当前硬件实现中并不一定存在。标准委员会决定将这些情况保留为未定义,例如 uint32_t = (uint32_t)-1.23; 以及较少问题的 uint8_t a = (uint8_t)123456.7;,以避免要求编译器编写人员在不实现模运算语义的硬件上生成额外昂贵的代码修复行为。

请注意,C23 对浮点到整数类型的转换具有略微不同的规范:

> 6.3.1.4 实数浮点和整数
>

  1. 当将标准浮点类型的有限值转换为除 bool 之外的整数类型时,小数部分将被丢弃(即,值将向零截断)。如果整数部分的值不能由整数类型表示,行为是未定义的。<sup>66)</sup>

  2. 当将十进制浮点类型的有限值转换为除 bool 之外的整数类型时,小数部分将被丢弃(即,值将向零截断)。如果整数部分的值不能由整数类型表示,将引发“无效”浮点异常,并且转换的结果是未指定的。

脚注:<sup>66)</sup> 当将整数类型的值转换为无符号类型时执行的余数运算在将实数浮点类型的值转换为无符号类型时不需要执行。因此,可移植实数浮点值的范围为(−1Utype_MAX + 1)。

从十进制浮点表示到整数的转换行为更加明确:如果目标类型无法表示该值,将引发浮点异常,这似乎是一个非常强的约束,因为至少有 8 种甚至更多的整数类型需要特别处理,不包括位精确的整数类型...

英文:

Your question is exactly this:

> Say uint8_t a = (uint8_t)123456; is defined given the wrapping around, uint8_t a = (uint8_t)123456.7 is UB as C standard does not require the wrapping. Is this what the standard says?

The language of the Standard seems unambiguous about that and the footnote confirms that the modulo operation that is defined for integer conversions does not necessarily occur for floating point conversions.

This text was already present in the C99 version of the C Standard (with a different footnote number), and also in the C90 version (aka ANSI C) without a reference to the _Bool type.

The reason for this apparent semantic inconsistency in the C Standard is probably the concern to keep existing implementations and hardware behavior compatible with the Standard. It may be linked to the binary representation of negative floating point numbers: while all but some ancient architectures have used two's complement representation for signed integers for a long time (this is actually mandated by the latest C23 Standard), floating point numbers generally use sign + magnitude representations. The modulo semantics of signed integer to unsigned integer conversions costs nothing on two's complement representations, but would require extra silicon for floating point values, which was not present on all current hardware implementations at the time. The Standard Committee decided to keep these cases undefined for uint32_t = (uint32_t)-1.23; and also for the less problematic uint8_t a = (uint8_t)123456.7; to avoid the requirement for compiler writers to produce extra costly code to fix the behavior on hardware that does not implement the modulo semantics already.

Note that the C23 has a slightly different spcification for the conversion from floating point to integer types:

> 6.3.1.4 Real floating and integer
>
>1 When a finite value of standard floating type is converted to an integer type other than bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.<sup>66)</sup>
>
>2 When a finite value of decimal floating type is converted to an integer type other than bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the "invalid" floating-point exception shall be raised and the result of the conversion is unspecified.
>
> Footnote: <sup>66)</sup> The remaindering operation performed when a value of integer type is converted to unsigned type need not be performed when a value of real floating type is converted to unsigned type. Thus, the range of portable real floating values is (−1, Utype_MAX + 1).

The behavior is more explicit for conversions from decimal floating point representations to integer: a floating point exception must be raised if the value is not representable in the target type, which seems a very strong constraint as there are at least 8 and possibly more integral types to handle specifically, not counting the bit-precise integer types...

答案3

得分: 0

范围说明符 (−1, U_type_MAX+1)排他的进一步阅读)。也就是说,指定的端点属于范围本身。因此,这意味着给定的unsigned类型可以表示的浮点数的包括范围的下界是接近零的-1之后的浮点数(对于IEEE-754 float来说,可能是-0.999999940395之类的值)。类似地,上界将是在U_type_MAX+1之前的下一个可表示值(将截断为U_type_MAX)。

看看你的例子:

  1. 3.14将被截断为3,明显可以表示为uint32_t
  2. -1.23将被截断为-1,这不能被任何无符号类型表示,因此这个转换是未定义行为。
  3. uint32_t的最大可表示值为4294967295,因此对于该类型,2147483646的试验值完全被定义为转换;然而,uint8_t的最大值为255,因此对该类型的转换是未定义行为。

再举一个例子,从-0.999999940395转换为uint32_t将是明确定义的,因为该值首先将被截断,得到零,这对于任何无符号类型都是可表示的。

英文:

The range specifier, (−1, U_type_MAX+1) is exclusive (further reading). That is to say, the specified endpoints are not part of the range itself. So, that means that the inclusive range for a floating-point number that can be represented by the given unsigned type has, as its lower-bound, the floating-point number that is the next after -1 towards zero (which will be something like -0.999999940395 for an IEEE-754 float). Similarly, the upper-bound will be the next lower representable value before U_type_MAX+1 (which will be truncated to U_type_MAX).

Looking at your examples:

  1. 3.14 will be truncated to 3 &ndash; which is clearly representable as a uint32_t.
  2. -1.23 will be truncated to -1 &ndash; which is not representable by any unsigned type, so that conversion is undefined behaviour.
  3. The maximum representable value of a uint32_t is 4294967295, so your trial value of 2147483646 is perfectly-well defined for conversion to that type; however, the maximum value for a uint8_t is 255, so conversion to that type is undefined behaviour.

To add another example, conversion from -0.999999940395 to uint32_t will be well-defined because that value will first be truncated, yielding zero, which is representable by any unsigned type.

答案4

得分: 0

The Standard imposes no requirements on what implementations do when converting an out-of-range floating-point value to unsigned int. For some purposes, it may be most useful for implementations to "peg" to UINT_MAX, for some it may be most useful for implementations to use wraparound semantics, and for some it may be most useful to trigger a trap that raises a signal, terminates the program, or otherwise acts to prevent the results from invalid computations from being mistaken for valid data.

If an implementation processes conversions to unsigned with wraparound semantics, it would probably be most useful for it to process conversions to smaller unsigned sizes likewise. If it traps such conversions with unsigned, however, having it trap out-of-range conversions to smaller values would likely be more useful than using wrap-around semantics for values within range of unsigned int but trapping semantics outside that range. The Standard gives implementations the freedom to behave in whichever way is more useful, on the presumption that implementations wouldn't use such freedom to process out-of-range conversions to smaller types in a way that's gratuitously more weird than conversions to larger types.

英文:

The Standard imposes no requirements on what implementations do when converting an out-of-range floating-point value to unsigned int. For some purposes, it may be most useful for implementations to "peg" to UINT_MAX, for some it may be most useful for implementations to use wraparound semantics, and for some it may be most useful to trigger a trap that raises a signal, terminates the program, or otherwise acts to prevent the results from invalid computations from being mistaken for valid data.

If an implementation processes conversions to unsigned with wraparound semantics, it would probably be most useful for it to process conversions to smaller unsigned sizes likewise. If it traps such conversions with unsigned, however, having it trap out-of-range conversions to smaller values would likely be more useful than using wrap-around semantics for values within range of unsigned int but trapping semantics outside that range. The Standard gives implementations the freedom to behave in whichever way is more useful, on the presumption that implementations wouldn't use such freedom to process out-of-range conversions to smaller types in a way that's gratuitously more weird than conversions to larger types.

huangapple
  • 本文由 发表于 2023年2月27日 17:53:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75578931.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定