Detecting endianness type at runtime – how many bytes are needed for conclusive results?

huangapple go评论56阅读模式
英文:

Detecting endianness type at runtime - how many bytes are needed for conclusive results?

问题

I want to be able to detect the endianness of my system at runtime, programatically.

In this question, there is an example of a function using 4 bytes to determine 4 main types of endianness: BIG, SHORT, BIG WORD, SHORT WORD.

int endianness(void)
{
  uint8_t buffer[4];

  buffer[0] = 0x00;
  buffer[1] = 0x01;
  buffer[2] = 0x02;
  buffer[3] = 0x03;

  switch (*((uint32_t *)buffer)) {
  case 0x00010203: return ENDIAN_BIG;
  case 0x03020100: return ENDIAN_LITTLE;
  case 0x02030001: return ENDIAN_BIG_WORD;
  case 0x01000302: return ENDIAN_LITTLE_WORD;
  default:         return ENDIAN_UNKNOWN;
}

My question is: are 4 bytes enough to conclude the endianess, or should one maybe use more to be extra careful for future inventions (like maybe BIG and SMALL sets of 3 or 4).

My concern is that some unholy version of endianess would maybe result in the same order of bytes as the ones presented, but under the hood, it would actually be something different.

That being said, I feel like maybe it wouldn't matter as long as the results are precise. For instance, if the longest variable in my program is 4 bytes, and it reliably produces the same signature as the function above, then it shouldn't be a problem.

I am specifically asking for the type of testing as the example above.

英文:

I want to be able to detect the endianness of my system at runtime, programatically.

In this question, there is an example of a function using 4 bytes to determine 4 main types of endiannedness: BIG, SHORT, BIG WORD, SHORT WORD.

int endianness(void)
{
  uint8_t buffer[4];

  buffer[0] = 0x00;
  buffer[1] = 0x01;
  buffer[2] = 0x02;
  buffer[3] = 0x03;

  switch (*((uint32_t *)buffer)) {
  case 0x00010203: return ENDIAN_BIG;
  case 0x03020100: return ENDIAN_LITTLE;
  case 0x02030001: return ENDIAN_BIG_WORD;
  case 0x01000302: return ENDIAN_LITTLE_WORD;
  default:         return ENDIAN_UNKNOWN;
}

My question is: are 4 bytes enough to conclude the endianess, or should one maybe use more to be extra careful for future inventions (like maybe BIG and SMALL sets of 3 or 4).

My concern is that some unholy version of endianess would maybe result in the same order of bytes as the ones presented, but under the hood, it would actually be something different.

That being said, I feel like maybe it wouldn't matter as long as the results are precise. For instance, if the longest variable in my program is 4 bytes, and it reliably produces the same signature as the function above, then it shouldn't be a problem.

I am specifically asking for the type of testing as the example above.

答案1

得分: 2

What you're testing for should be sufficient, however the way in which you're testing can trigger undefined behavior because you're accessing an lvalue of one type through an lvalue of a different type. You can sidestep this if you use a union:

union endian {
  uint8_t bytes[4];
  uint32_t word;
};

int endianness(void)
{
  union endian test = { .bytes = { 0, 1, 2, 3 } };

  switch (test.word) {
  case 0x00010203: return ENDIAN_BIG;
  case 0x03020100: return ENDIAN_LITTLE;
  case 0x02030001: return ENDIAN_BIG_WORD;
  case 0x01000302: return ENDIAN_LITTLE_WORD;
  default:         return ENDIAN_UNKNOWN;
}
英文:

What you're testing for should be sufficient, however the way in which you're testing can trigger undefined behavior because you're accessing an lvalue of one type through an lvalue of a different type. You can sidestep this if you use a union:

union endian {
  uint8_t bytes[4];
  uint32_t word;
};

int endianness(void)
{
  union endian test = { .bytes = { 0, 1, 2, 3 } };

  switch (test.word) {
  case 0x00010203: return ENDIAN_BIG;
  case 0x03020100: return ENDIAN_LITTLE;
  case 0x02030001: return ENDIAN_BIG_WORD;
  case 0x01000302: return ENDIAN_LITTLE_WORD;
  default:         return ENDIAN_UNKNOWN;
}

答案2

得分: 2

C标准关于内存中字节顺序的规定在C 2018 6.2.6 2中:

除了位域之外,对象由一个或多个字节的连续序列组成,这些字节的数量、顺序和编码要么明确指定,要么是实现定义的。

这并不意味着short的字节顺序与intlonglong longdouble或其他类型的字节顺序存在任何关联。它不规定字节顺序受到某些可允许的顺序的限制,比如必须使用你列出的四种顺序中的一种。对于四个字节的int,根据C标准,有24种方式来排列这四个字节中的数据,C实现可以选择其中的任何一种,同样地,对于四个字节的long,同一个C实现也可以选择这24种排列方式中的任何一种,可以相同也可以不同。

要完全测试C实现使用的字节顺序,您需要测试每个字节在每种大于一个字节的对象中。

在大多数C实现中,假定字节采用大端序(最高有效字节在前,然后按降序的顺序排列字节)或小端序(相反顺序)。在某些C实现中,可能存在混合字节顺序,这是由于特定C实现的历史原因,例如,它的两字节对象可能最初是基于硬件使用的某种字节顺序,而四字节对象是由两字节对象以程序员选择的方式排列而成的。

类似的情况也可能出现在较大的对象上,例如以两个32位部分存储的64位double

然而,采用其他顺序的变体,例如以显著性标记的字节0、1、2和3的顺序存储在3、0、1、2的顺序中,只会在技术上符合C标准但不提供任何实际用途的不正常C实现中出现。在普通代码中,可以忽略这种可能性。

要探索所有可能性,还必须考虑对象的字节内部存储的位的顺序。C标准要求“相同的位”仅在相应的有符号和无符号类型之间使用相同的含义,如C 2018 6.2.6.2 2中所述:

…每个值位都应该具有与相应无符号类型的对象表示中相同位的相同值…

因此,在某个C实现中,一个int的第一个字节的位3和4表示2的3次方和2的4次方,而一个long中表示2的4次方和2的3次方,从技术上讲符合C标准。尽管这看起来很奇怪,但要注意标准明确规定了这一点,仅对应的有符号和无符号类型适用,而不适用于其他类型,这表明存在一些C实现,为不同类型的相应位分配了不同的含义。

英文:

What the C standard says about the order of bytes in memory is in C 2018 6.2.6 2:

> Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number, order, and encoding of which are either explicitly specified or implementation-defined.

This does not say there is any relationship between the order of bytes in a short and the order of bytes in an int, or in a long, a long long, a double, or other types. It does not say the order is constrained to only certain permissible orders, such as that one of the four orders you list must be used. There are 4! = 24 ways to order four bytes, and it would be permissible, according to the C standard, for a C implementation to use any one of those 24 for a four-byte int, and for the same C implementation to use any one of those 24, the same or different, for a four-byte long.

To fully test what orders a C implementation is using, you would need to test each byte in each type of object bigger than one byte.

In most C implementations, it suffices to assume bytes are in big-endian order (most significant byte first, then bytes in order of decreasing significance) or little-endian order (the reverse). In some C implementations, there may be a hybrid order due to the history of the particular C implementation—for example, its two-byte objects might have used one byte order due to hardware it originally ran on while its four-byte objects were constructed in software from two-byte objects that were ordered based on the programmer’s choice.

A similar situation can arise with larger objects, such as a 64-bit double stored as two 32-bit parts.

However, variants with other orders, such as the bytes 0, 1, 2, and 3 (denoted by significance) stored in the order 3, 0, 1, 2, would arise only in perverse C implementations that technically conform to the C standard but do not serve any practical purpose. Such possibilities can be ignored in ordinary code.

To explore all possibilities, you must also consider the order in which bits are stored within the bytes of an object. The C standard requires that “the same bits” be used for the same meaning only between corresponding signed and unsigned types, in C 2018 6.2.6.2 2:

> … Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type…

Thus, a C implementation in which bits 3 and 4 of the first byte of an int represented 2<sup>3</sup> and 2<sup>4</sup> but represented 2<sup>4</sup> and 2<sup>3</sup> in a long would technically conform to the C standard. While this seems odd, note that the fact the standard specifically constraints this for corresponding signed and unsigned types but not for other types suggests there were C implementations that assigned different meanings to corresponding bits in different types.

huangapple
  • 本文由 发表于 2023年4月13日 21:34:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76006088.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定