
huangapple go评论51阅读模式

When does the default copy assignment operator from C++11 utilize bit-wise copy instead of member-wise copy?


在x86-64 GCC 13.1和Clang 16.0.0中,copy<PrivateBase>函数使用成员逐个复制,而copy<PublicBase>函数使用位逐位复制。您可以参考编译器资源管理器上的详细源代码和汇编代码,或查看下面提供的代码片段:

class PublicBase {
    int num;
    char c1;

class PrivateBase {
    int num;
    char c1;

template<typename T>
__attribute_noinline__ void copy(T *dst, T *src) {
    *dst = *src;

template void copy(PublicBase *dst, PublicBase *src);
template void copy(PrivateBase *dst, PrivateBase *src);
void copy<PublicBase>(PublicBase*, PublicBase*):
        mov     rax, QWORD PTR [rsi]
        mov     QWORD PTR [rdi], rax
void copy<PrivateBase>(PrivateBase*, PrivateBase*):
        mov     eax, DWORD PTR [rsi]
        mov     DWORD PTR [rdi], eax
        movzx   eax, BYTE PTR [rsi+4]
        mov     BYTE PTR [rdi+4], al








因此,PrivateBase使用成员逐个复制是合理的。否则,在调用copy<PrivateBase>(derived, base)时,基类的填充可能会覆盖PrivateDerived::c2




  • 在C++11之前:
    • 是一个聚合类(没有私有或受保护的非静态数据成员),
    • 没有用户声明的复制赋值运算符,
    • 没有用户声明的析构函数,并且
    • 没有非POD类(或这些类型的数组)或引用类型的非静态数据成员。
  • 自C++11以来
    • 是一个平凡类,
    • 是一个标准布局类(对于所有非静态数据成员具有相同的访问控制),并且
    • 没有非POD类(或这些类型的数组)的非静态数据成员。


int main() {
    std::cout << "PublicBase: is_standard_layout=" << is_standard_layout<PublicBase>::value
              << ", is_trivial=" << is_trivial<PublicBase>::value
              << ", is_pod=" << is_pod<PublicBase>::value << std::endl;

    std::cout << "PrivateBase: is_standard_layout=" << is_standard_layout<PrivateBase>::value
              << ", is_trivial=" << is_trivial<PrivateBase>::value
              << ", is_pod=" << is_pod<PrivateBase>::value << std::endl;
// 输出:
// PublicBase: is_standard_layout=1, is_trivial=1, is_pod=1
// PrivateBase: is_standard_layout=1, is_trivial=1, is_pod=1

In both x86-64 GCC 13.1 and Clang 16.0.0, the copy&lt;PrivateBase&gt; function uses member-wise copy, while the copy&lt;PublicBase&gt; function uses bit-wise copy. You could refer to the detailed source code and assembly code on the compiler explorer or see the code snippets provided below:

class PublicBase {
    int num;
    char c1;

class PrivateBase {
    int num;
    char c1;

template&lt;typename T&gt;
__attribute_noinline__ void copy(T *dst, T *src) {
    *dst = *src;

template void copy(PublicBase *dst, PublicBase *src);
template void copy(PrivateBase *dst, PrivateBase *src);
void copy&lt;PublicBase&gt;(PublicBase*, PublicBase*):
        mov     rax, QWORD PTR [rsi]
        mov     QWORD PTR [rdi], rax
void copy&lt;PrivateBase&gt;(PrivateBase*, PrivateBase*):
        mov     eax, DWORD PTR [rsi]
        mov     DWORD PTR [rdi], eax
        movzx   eax, BYTE PTR [rsi+4]
        mov     BYTE PTR [rdi+4], al

The question is, when does the default copy assignment operator from C++11 use bit-wise copy instead of member-wise copy? It seems that neither is_trivially_copyable nor is_pod provides the answer.


According to cppreference-is_trivially_copyable:
> Objects of trivially-copyable types that are not potentially-overlapping subobjects are the only C++ objects that may be safely copied with std::memcpy.

Both PublicBase and PrivateBase are trivially copyable and not subobjects, but PrivateBase is copied with member-wise instead of bit-wise.


If there is a derived class of PublicBase or PrivateBase, the derived class of PrivateBase will reuse the padding of the base class, while that of PublicBase won't.

Therefore, it is reasonable that PrivateBase is copied with member-wise. Otherwise, the padding of base class may overwrite PrivateDerived::c2 when calling copy&lt;PrivateBase&gt;(derived, base).

class PublicDerived : public PublicBase {
    char c2;

class PrivateDerived : public PrivateBase {
    char c2;

int main() {
    std::cout &lt;&lt; &quot;sizeof(PublicBase)=&quot; &lt;&lt; sizeof(PublicBase) &lt;&lt; std::endl;
    std::cout &lt;&lt; &quot;sizeof(PublicDerived)=&quot; &lt;&lt; sizeof(PublicDerived) &lt;&lt; std::endl;
    std::cout &lt;&lt; &quot;sizeof(PrivateBase)=&quot; &lt;&lt; sizeof(PrivateBase) &lt;&lt; std::endl;
    std::cout &lt;&lt; &quot;sizeof(PrivateDerived)=&quot; &lt;&lt; sizeof(PrivateDerived) &lt;&lt; std::endl;

    return 0;
// Output:
// sizeof(PublicBase)=8
// sizeof(PublicDerived)=12
// sizeof(PrivateBase)=8
// sizeof(PrivateDerived)=8

I am confused about how the compiler decides to reuse padding of the base class or not.
According to the related question, the POD type doesn't reuse padding of the base class.

According to the cppreference-POD_class:
> A POD class is a class that
> - until C++11:
> - is an aggregate (no private or protected non-static data members),
> - has no user-declared copy assignment operator,
> - has no user-declared destructor, and
> - has no non-static data members of type non-POD class (or array of such types) or reference.
> - since C++11
> - is a trivial class,
> - is a standard-layout class (has the same access control for all non-static data members), and
> - has no non-static data members of type non-POD class (or array of such types).

before C++11, PrivateBase is not POD type (because it has private data members), but since C++11, it becomes POD type (because it has the same access control for all non-static data members).

int main() {
    std::cout &lt;&lt; &quot;PublicBase: is_standard_layout=&quot; &lt;&lt; is_standard_layout&lt;PublicBase&gt;::value
              &lt;&lt; &quot;, is_trivial=&quot; &lt;&lt; is_trivial&lt;PublicBase&gt;::value
              &lt;&lt; &quot;, is_pod=&quot; &lt;&lt; is_pod&lt;PublicBase&gt;::value &lt;&lt; std::endl;

    std::cout &lt;&lt; &quot;PrivateBase: is_standard_layout=&quot; &lt;&lt; is_standard_layout&lt;PrivateBase&gt;::value
              &lt;&lt; &quot;, is_trivial=&quot; &lt;&lt; is_trivial&lt;PrivateBase&gt;::value
              &lt;&lt; &quot;, is_pod=&quot; &lt;&lt; is_pod&lt;PrivateBase&gt;::value &lt;&lt; std::endl;
// Output:
// PublicBase: is_standard_layout=1, is_trivial=1, is_pod=1
// PrivateBase: is_standard_layout=1, is_trivial=1, is_pod=1


得分: 5







更重要的是,标准规定每个_基类子对象都是_潜在重叠的。此属性用于定义是否允许通过memcpy复制平凡可复制对象,因为_每个_基类子对象都是_潜在重叠的_,所以标准理论上允许_任何_类的尾部填充被重用。显然,这将混淆C兼容性,因为类类型也是有效的C struct,所以编译器不会那么激进。


GCC和Clang遵循Itanium C++ ABI,该ABI规定了基于C++03标准的POD布局目的的概念,明确基于C++03标准的POD定义,除了一些特殊情况和一些澄清。这个概念,而不是C++标准的"POD"概念,用于决定Itanium C++ ABI中的尾部填充是否被重用。




> The question is, when does the default copy assignment operator from C++11 use bit-wise copy instead of member-wise copy? It seems that neither is_trivially_copyable nor is_pod provides the answer.

First minor correction on terminology: You probably mean the implicitly-defined copy assignment operator. This is different from implicitly-declared copy assignment operator and defaulted or explicitly-defaulted copy assignment operator.

The implicitly-defined copy assignment operator always uses member-wise copy, except for unions, for which the object representation is copied instead (i.e. byte-wise as if by memcpy).

However, the value of padding is unspecified, so that the compiler doesn't need to care about overwriting it if it knows that it is indeed only padding, i.e. not reused for derived classes members.

Then, if the compiler knows that the assignment operator is equivalent to copying the members' object representations directly, e.g. if the copy assignment operator is trivial, then it can replace the member-wise copy by a copy of the object representation of the whole object. This wouldn't affect any observable behavior since the only difference, the resulting padding values, are unspecified anyway. Even if the copy assignment is not trivial, the compiler might see e.g. after inlining that the observable behavior wouldn't be affected by this optimization. Anything is permitted as long as the observable behavior doesn't change to one that wasn't permitted on the abstract machine ("as-if" rule).

> I am confused about how the compiler decides to reuse padding of the base class or not. According to the related question, the POD type doesn't reuse padding of the base class.

This is not specified by the standard. It is up to the compiler to decide under which circumstances padding is reused and that does not need to coincide with the POD property. In fact the POD concept is deprecated and not used by current standard versions any more except for the deprecated is_pod type trait.

Even more so, the standard says that every base class subobject is potentially-overlapping. This property is used to define whether copying a trivially-copyable object by memcpy is permitted and because every base class subobject is potentially-overlapping, the standard, in theory, allows the tail padding of any class to be reused. Obviously this will however mess up C compatibility for class types that are also valid C structs, so a compiler isn't going to be that aggressive.

Because reuse of padding affects ABI compatibility between translation units, there will however be a general rule that the compiler will follow to maintain binary compatibility between translation units. Usually there is an ABI specification for the compiler/platform combination.

GCC and Clang follow the Itanium C++ ABI, which specifies the concept of POD for the purpose of layout which explicitly is based on the POD definition from the C++03 standard, excluding some special cases and with some clarifications. This concept, not the C++ standard's concept of "POD", is used to decided whether tail-padding is reused in the Itanium C++ ABI.

In C++03 PublicBase was POD, but PrivateBase wasn't, and so the former is POD for the purpose of layout, while the latter isn't. Consequently tail padding is reused only for the latter by GCC and Clang.

When tail padding is potentially reused the compiler can't copy the whole object representation for the implicit copy assignment operator because that would potentially modify a byte of a derived classes member as you already noticed, which would potentially affect the observable behavior and therefore would not be covered under "as-if".

  • 本文由 发表于 2023年6月25日 21:16:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76550587.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
