对象对齐和字节边界

huangapple go评论105阅读模式
英文:

Object alignment and byte boundaries

问题

对象是否必须位于由其对齐要求确定的字节边界上?例如,考虑以下代码:

struct alignas(16) A { int x; }; // 对齐要求为16

int main
{
    A a[2];
}

a[0]a[1] 是否必须位于16字节边界上?换句话说,每个地址的低四位是否保证为零?

我本以为是这样的,但C++标准似乎说了不同的事情:

对象类型具有对齐要求([basic.fundamental],[basic.compound]),这些要求限制了可以分配给该类型对象的地址。对齐是一个实现定义的整数值,表示给定对象可以分配的连续地址之间的字节数。

参考链接

如果我理解正确的话,它只是说a[0]a[1] 的地址之间的差必须是16。地址本身可以是任何值。

对于我来说,这个定义要不那么有用,所以我想应该向专家核实一下。

英文:

Must objects be located at a byte boundary determined by their alignment requirement? For example given this code

struct alignas(16) A { int x; }; // alignment requirement is 16

int main
{
    A a[2];
}

must a[0] and a[1] be located at a 16 byte boundary? In other words, are the lower four bits of each address guaranteed to be zero?

I would have thought so, but the C++ standard seems to say something different

> Object types have alignment requirements ([basic.fundamental],
> [basic.compound]) which place restrictions on the addresses at which
> an object of that type may be allocated. An alignment is an
> implementation-defined integer value representing the number of bytes
> between successive addresses at which a given object can be allocated.

reference.

If I am reading that correctly it merely says that the difference between the addresses of a[0] and a[1] must be 16. The addresses themselves could be anything.

This is a far less useful definition (for me at least) so I thought I should check with the experts.

答案1

得分: 4

以下是翻译好的部分:

In other words, are the lower four bits of each address guaranteed to be zero?

换句话说,每个地址的低四位是否保证为零?

The problem is that the question isn't well-defined. What do you mean with bits of the address? There is no definition for that in the standard.

问题在于这个问题没有明确定义。你所说的地址的位是什么意思?标准中没有对此进行定义。

An address is not an integer value. It is just an abstract property attached uniquely to each byte of storage, without any additional mathematical structure besides identity, except that locally, but not necessarily globally, there may be a linear order inherited by successive bytes of storage. So there are no bits in the sense of a binary representation of the address' value. But the linear order allows one to, locally, talk about distance between bytes of storage or addresses which is what is used in your quoted definition of alignment.

地址不是整数值。它只是附加到存储器的每个字节上的抽象属性,除了标识性质外,没有任何附加的数学结构,除非在局部范围内(但不一定在全局范围内),存储器的连续字节之间可能存在继承的线性顺序。因此,在二进制表示中,地址的值没有位的概念。但线性顺序允许我们在局部范围内讨论存储器字节或地址之间的距离,这是你引用的对齐定义中使用的内容。

Maybe you mean bits in the sense of the object representation of a pointer value that represents a given address, but besides being implementation-defined, the representation doesn't even need to be the same for different pointer types which can still represent the same address.

也许你指的是指针值的对象表示中的位,但除了在实现中定义之外,这种表示甚至不需要对于不同的指针类型是相同的,而这些指针类型仍然可以表示相同的地址。

Maybe you mean the result of reinterpret_casting a pointer value to an integer value and then looking at its lowest bits in its binary representation, but the mapping of pointer values to addresses in such a cast is also completely implementation-defined, aside from a round-trip conversion giving back the original pointer value.

也许你指的是将指针值通过reinterpret_cast转换为整数值的结果,然后查看其二进制表示中的最低位,但在这种转换中,指针值到地址的映射也完全是实现定义的,除了一个来回转换可以还原原始指针值之外。

So the question cannot generally be answered and is a detail of the C++ implementation, although it should usually be determined by the target machine's architecture as the standard mentions in a note in [expr.reinterpret.cast]

因此,通常不能一般性地回答这个问题,这是C++实现的细节,尽管标准在[expr.reinterpret.cast]中提到,它通常应由目标机器的体系结构确定。

It is intended to be unsurprising to those who know the addressing structure of the underlying machine.

这旨在对了解底层机器寻址结构的人来说不会令人惊讶。

But that can look differently on different machines. For example there doesn't need to be a single flat address space.

但在不同的机器上可能会有不同的情况。例如,不一定需要一个单一的平面地址空间。

Similarly

must a[0] and a[1] be located at a 16 byte boundary?

类似地,

a[0] 和 a[1] 必须位于 16 字节边界吗?

isn't clear on what "boundary" means. However, what the standard does guarantee is that both a[0] and a[1]'s addresses will satisfy the alignment 16 and every weaker (i.e. smaller value) alignment supported by the implementation.

对于“边界”一词并不清楚。然而,标准确保的是a[0]a[1]的地址都将满足对齐要求16以及实现支持的每个更弱(即较小值)的对齐要求。

英文:

> In other words, are the lower four bits of each address guaranteed to be zero?

The problem is that the question isn't well-defined. What do you mean with bits of the address? There is no definition for that in the standard.

An address is not an integer value. It is just an abstract property attached uniquely to each byte of storage, without any additional mathematical structure besides identity, except that locally, but not necessarily globally, there may be a linear order inherited by successive bytes of storage. So there are no bits in the sense of a binary representation of the address' value. But the linear order allows one to, locally, talk about distance between bytes of storage or addresses which is what is used in your quoted definition of alignment.

Maybe you mean bits in the sense of the object representation of a pointer value that represents a given address, but besides being implementation-defined, the representation doesn't even need to be the same for different pointer types which can still represent the same address.

Maybe you mean the result of reinterpret_casting a pointer value to an integer value and then looking at its lowest bits in its binary representation, but the mapping of pointer values to addresses in such a cast is also completely implementation-defined, aside from a round-trip conversion giving back the original pointer value.

So the question cannot generally be answered and is a detail of the C++ implementation, although it should usually be determined by the target machine's architecture as the standard mentions in a note in [expr.reinterpret.cast]

> It is intended to be unsurprising to those who know the addressing structure of the underlying machine.

But that can look differently on different machines. For example there doesn't need to be a single flat address space.

Similarly

> must a[0] and a[1] be located at a 16 byte boundary?

isn't clear on what "boundary" means. However, what the standard does guarantee is that both a[0] and a[1]'s addresses will satisfy the alignment 16 and every weaker (i.e. smaller value) alignment supported by the implementation.

答案2

得分: 4

C++标准中描述的对齐要求旨在适应硬件要求(包括效率),使对象位于某种对齐要求的倍数地址上。然而,在严格的C++中无法表达这一点,因为标准规定的指针/地址语义没有提供任何分割地址或以其他方式测试地址是否是某个数字的倍数的方法。

因为C++使用的内存模型是将对象组成字节,可以形成对象数组,并定义了指针的减法运算,因此对象之间的距离是可见的。因此,C++标准以这些距离为基础描述了对齐要求,这是其模型中最好的描述。

总结一下,在C++中没有任何严格定义的操作能够告诉你一个地址是否是某个对齐要求A的倍数。有一些操作可以告诉你两个对象之间的距离(特别是数组中的两个元素),这些距离是以A字节的倍数分隔的。在C++中可见的是对象之间的距离,因此标准描述的也是这个距离。

在理论上,可以创建一个C++实现,其中所有低位为000的指针映射到具有101位的硬件地址,依此类推,001映射到110,依此类推。硬件将看到对齐的地址,但在C++实现中运行的任何程序在将指针转换为整数时会看到预映射的地址。在C++模型中,具有四字节对齐要求的对象可以具有地址1003,但实际的硬件地址可能为1008。因此,标准允许某种意义上不是其对齐要求的倍数的地址,但这不是意图。

英文:

The alignment requirements described in the C++ standard are designed to accommodate hardware requirements (including efficiency) for objects to be located at addresses that are multiples of some alignment requirement. However, there is no way to express this in strict C++ because the pointer/address semantics specified by the standard do not provide any way to divide addresses or otherwise test whether an address is a multiple of a number.

Because C++ uses a memory model in which objects are composed of bytes, arrays of objects can be formed, and subtraction of pointers is defined, the distances between objects are visible. So the C++ standard describes alignment requirements in terms of these distances, that being the best available description in its model.

Recapping, there is no operation in C++ that is strictly defined in the C++ standard that would tell you whether an address were a multiple of some alignment requirement A. There are operations that will tell you whether the distance between two objects (notably two elements in an array) are separated by a multiple of A bytes. That distance-between-objects is what is visible in C++, so it is what the standard describes.

It is true a program can convert a pointer to an integer type and, in common C implementations, can observe the address that appears in the integer type. However, the details of pointer-to-integer conversion are implementation-defined. An address that is a multiple of an alignment requirement for machine purposes is not required by the C++ standard to convert to an integer that is a multiple of that alignment requirement.

Theoretically, one could make a C++ implementation in which all pointers with low bits 000 mapped to hardware addresses with bits 101, and so on for 001 mapping to 110, and so on. The hardware would see the aligned address, but any program running in the C++ implementation would, when it converted a pointer to an integer, see the pre-mapped address. An object with an alignment requirement of four bytes could have address 1003 in the C++ model but have actual hardware address 1008. So the standard allows addresses that are in some sense not multiples of their alignment requirements, but that is not the intent.

huangapple
  • 本文由 发表于 2023年8月5日 15:26:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76840566.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定