在C/C++中的字符数组填充(Padding)

huangapple go评论96阅读模式
英文:

Padding in Character Arrays in C/C++

问题

多个互联网来源,包括此问题在Stackoverflow上建议,在C中,数组元素之间永远不会有填充。

然而,根据《编译器原理、技术和工具第二版》(链接)第428页(逻辑)或第453页(物理):

"在许多机器上,对整数进行加法运算的指令可能要求整数对齐,即位于可被4整除的地址。尽管字符数组(在C中)长度为10只需要足够的字节来容纳十个字符,但编译器可能分配12个字节以获得适当的对齐,留下2个字节未使用。"

为了验证这一点,我编写了一个小的C++程序来打印字符数组元素的地址,没有填充。

编辑:我的问题是,数组元素之间是否存在填充。答案已经解释,书中提到的填充将位于数组的末尾。谢谢!

英文:

Multiple sources across the internet, including this question at Stackoverflow, suggest that, there will never be any padding between elements of an array in C.

However, according to the 2nd Edition of Compilers: Principles, Techniques, and Tools, page 428 (logical) or 453 (physical):

> On many machines, instructions to add integers may expect integers to be aligned, that is, placed at an address divisible by 4. Although a character array (as in C) of length 10 needs only enoughbytes to hold ten characters, a compiler may allocate 12 bytes to get the proper alignment, leaving 2 bytes unused.

To verifiy this, I wrote a small C++ program to print the addresses of the char array elements, and there is no padding.

EDIT: My question was, whether or not, padding will exist between array elements. The answers have explained that the padding mentioned in the book, will be at the end of the array. Thanks!

答案1

得分: 3

尽管字符数组(如C语言中的数组)的长度为10只需要足够的字节来容纳十个字符,但编译器可能分配12个字节以获得正确的对齐,留下2个未使用的字节。

这是一个误导性的说法。如果在文件作用域中声明如下:

int before;
char my_array[10];
int after;

那么数组的大小就是10。没有别的。编译器可能会在my_array的存储之前或之后留下未分配的空间,但这取决于编译器,这种额外的空间不是数组的一部分(在这种情况下,我通常不会称其为“填充”,但个人看法可能有所不同)。

如果声明一个包含数组的结构体:

struct my_struct {
    int before;
    char my_array[10];
    int after;
}

那么编译器可能在my_arrayafter之间的位置放置填充,但数组的大小仍然是10。填充属于结构体,而不属于数组。

对于元素类型本身具有大于1字节对齐要求的数组也适用相似的规则。数组的开始位置可能被分配,以便在之前有未使用的空间,但该空间不属于数组。

要验证这一点,我编写了一个小的C++程序来打印字符数组元素的地址,没有填充是肯定的。

在相同数组的元素之间绝对不会有填充。C和C++都非常明确,你引用的源代码也没有说明相反。

英文:

> Although a character array (as in C) of length 10 needs only enoughbytes to hold ten characters, a compiler may allocate 12 bytes to get the proper alignment, leaving 2 bytes unused.

That is a misleading statement. If at file scope you declare

int before;
char my_array[10];
int after;

then the size of the array is 10. Period. The compiler might leave unassigned space before or after the storage for my_array, but that's at the compiler's discretion, and such extra space is not part of the array (and I don't usually refer to it as "padding" in such cases, but YMMV).

If you declare a structure containing an array:

struct my_struct {
    int before;
    char my_array[10];
    int after;
}

then the compiler may lay out the structure with padding between my_array and after, but the size of the array is still 10. The padding belongs to the structure, not to the array.

Similar applies for arrays whose element type itself has an alignment requirement larger than 1 byte. The beginning of the array may be assigned such that there is unused space before, but that space is not part of the array.

> To verifiy this, I wrote a small C++ program to print the addresses of the char array elements, and there is no padding.

There will definitely be no padding between elements of the same array. Both C and C++ are very clear on this, and the source you quote does not say otherwise.

答案2

得分: 2

你所提到的问题与结构填充相关,而不是与C或C++数组元素之间的填充有关。在这个上下文中,填充指的是在结构成员之间插入未使用的字节,以便在内存中正确对齐它们。

在数组的情况下,元素在内存中是连续存储的,它们之间没有填充。每个元素占用其数据类型所需的确切空间。

你提到的书籍《编译器原理、技术与工具》中的段落讨论的是结构填充,与数组没有直接关系。在处理结构时,编译器可能会在成员之间插入填充字节,以根据目标机器架构的对齐要求对它们进行对齐。这种对齐可以提高性能,因为某些处理器对于某些数据类型可能有对齐限制。

英文:

The issue you are referring to is related to structure padding, rather than padding between elements of an array in C or C++. Padding in this context refers to the insertion of unused bytes between structure members to align them properly in memory.

In the case of arrays, elements are stored consecutively in memory without any padding between them. Each element occupies the exact amount of space required for its data type.

The passage you mentioned from the book "Compilers: Principles, Techniques, and Tools" is discussing structure padding, which is not directly related to arrays. When dealing with structures, compilers may insert padding bytes between members to align them according to the alignment requirements of the target machine architecture. This alignment can improve performance, as some processors may have alignment restrictions for certain data types.

答案3

得分: 1

被引用的文本谈到了(a)某些整数类型需要的对齐和(b)字符数组后的填充。提到了两种不同的类型,这暗示填充不是针对字符数组的,而是针对某种整数类型的。这可以从你引用的前一段文字中找到线索,该段落结束于:

聚合类型(如数组或结构)的存储必须足够大,以容纳其所有组件。

因此,引用可能是在讨论一般情况下的填充问题。字符数组本身不需要填充,但如果它位于一个也包含整数类型的结构中,可能需要在字符数组后进行填充,以使结构的下一个成员具有所需的对齐方式(或者使整个结构的大小足够大,以使其大小成为其成员的对齐要求的倍数)。引用还可能在讨论一般情况下放置多个对象,就像编译器会将多个对象放在堆栈空间或其他存储空间中一样。

英文:

The text you quote talks about (a) alignment needed for some integer type and (b) padding after a character array. This mention of two different types suggests the padding is not for the character array but is for some integer type. A clue for this is in the paragraph prior to your quote, which ends:

> Storage for an aggre􏰐gate type,􏰏 such as an array or structure,􏰏 must be large enough to hold all its components􏰑.

So the quote may discussing padding within aggregates generally. A character array does not need any padding by itself, but if it is in a structure which also has an integer type, padding may be needed after the character array so the next member of the structure has the alignment it needs (or so the whole structure has the size needed to make its size a multiple of the alignment requirements of its members). The quote may also be discussing placing multiple objects generally, as in arranging several objects that the compiler will place in stack space or other storage.

huangapple
  • 本文由 发表于 2023年6月8日 22:32:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76432936.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定