C++11默认复制赋值运算符何时使用位逐位复制而不是成员逐位复制?

huangapple go评论69阅读模式
英文:

When does the default copy assignment operator from C++11 utilize bit-wise copy instead of member-wise copy?

问题

在x86-64 GCC 13.1和Clang 16.0.0中,copy<PrivateBase>函数使用成员逐个复制,而copy<PublicBase>函数使用位逐位复制。您可以参考编译器资源管理器上的详细源代码和汇编代码,或查看下面提供的代码片段:

class PublicBase {
public:
    int num;
    char c1;
};

class PrivateBase {
private:
    int num;
    char c1;
};

template<typename T>
__attribute_noinline__ void copy(T *dst, T *src) {
    *dst = *src;
}

template void copy(PublicBase *dst, PublicBase *src);
template void copy(PrivateBase *dst, PrivateBase *src);
void copy<PublicBase>(PublicBase*, PublicBase*):
        mov     rax, QWORD PTR [rsi]
        mov     QWORD PTR [rdi], rax
        ret
void copy<PrivateBase>(PrivateBase*, PrivateBase*):
        mov     eax, DWORD PTR [rsi]
        mov     DWORD PTR [rdi], eax
        movzx   eax, BYTE PTR [rsi+4]
        mov     BYTE PTR [rdi+4], al
        ret

问题是,C++11的默认复制赋值运算符何时使用位逐位复制而不是成员逐个复制?似乎is_trivially_copyableis_pod都没有提供答案。

is_trivially_copyable

根据cppreference-is_trivially_copyable

不能潜在重叠子对象的平凡可复制类型的对象是仅有的可以安全地使用std::memcpy复制的C++对象。

PublicBasePrivateBase都是平凡可复制的,并且不是子对象,但PrivateBase使用成员逐个复制而不是位逐位复制。

is_pod

如果存在PublicBasePrivateBase的派生类,PrivateBase的派生类将重用基类的填充,而PublicBase的派生类不会。

因此,PrivateBase使用成员逐个复制是合理的。否则,在调用copy<PrivateBase>(derived, base)时,基类的填充可能会覆盖PrivateDerived::c2

我对编译器如何决定是否重用基类的填充感到困惑。
根据相关问题,POD类型不会重用基类的填充。

根据cppreference-POD_class

POD类是这样的类

  • 在C++11之前:
    • 是一个聚合类(没有私有或受保护的非静态数据成员),
    • 没有用户声明的复制赋值运算符,
    • 没有用户声明的析构函数,并且
    • 没有非POD类(或这些类型的数组)或引用类型的非静态数据成员。
  • 自C++11以来
    • 是一个平凡类,
    • 是一个标准布局类(对于所有非静态数据成员具有相同的访问控制),并且
    • 没有非POD类(或这些类型的数组)的非静态数据成员。

在C++11之前,PrivateBase不是POD类型(因为它有私有数据成员),但自C++11以来,它成为了POD类型(因为它对所有非静态数据成员具有相同的访问控制)。

int main() {
    std::cout << "PublicBase: is_standard_layout=" << is_standard_layout<PublicBase>::value
              << ", is_trivial=" << is_trivial<PublicBase>::value
              << ", is_pod=" << is_pod<PublicBase>::value << std::endl;

    std::cout << "PrivateBase: is_standard_layout=" << is_standard_layout<PrivateBase>::value
              << ", is_trivial=" << is_trivial<PrivateBase>::value
              << ", is_pod=" << is_pod<PrivateBase>::value << std::endl;
}
// 输出:
// PublicBase: is_standard_layout=1, is_trivial=1, is_pod=1
// PrivateBase: is_standard_layout=1, is_trivial=1, is_pod=1
英文:

In both x86-64 GCC 13.1 and Clang 16.0.0, the copy&lt;PrivateBase&gt; function uses member-wise copy, while the copy&lt;PublicBase&gt; function uses bit-wise copy. You could refer to the detailed source code and assembly code on the compiler explorer or see the code snippets provided below:

class PublicBase {
public:
    int num;
    char c1;
};

class PrivateBase {
private:
    int num;
    char c1;
};


template&lt;typename T&gt;
__attribute_noinline__ void copy(T *dst, T *src) {
    *dst = *src;
}

template void copy(PublicBase *dst, PublicBase *src);
template void copy(PrivateBase *dst, PrivateBase *src);
void copy&lt;PublicBase&gt;(PublicBase*, PublicBase*):
        mov     rax, QWORD PTR [rsi]
        mov     QWORD PTR [rdi], rax
        ret
void copy&lt;PrivateBase&gt;(PrivateBase*, PrivateBase*):
        mov     eax, DWORD PTR [rsi]
        mov     DWORD PTR [rdi], eax
        movzx   eax, BYTE PTR [rsi+4]
        mov     BYTE PTR [rdi+4], al
        ret

The question is, when does the default copy assignment operator from C++11 use bit-wise copy instead of member-wise copy? It seems that neither is_trivially_copyable nor is_pod provides the answer.

is_trivially_copyable

According to cppreference-is_trivially_copyable:
> Objects of trivially-copyable types that are not potentially-overlapping subobjects are the only C++ objects that may be safely copied with std::memcpy.

Both PublicBase and PrivateBase are trivially copyable and not subobjects, but PrivateBase is copied with member-wise instead of bit-wise.

is_pod

If there is a derived class of PublicBase or PrivateBase, the derived class of PrivateBase will reuse the padding of the base class, while that of PublicBase won't.

Therefore, it is reasonable that PrivateBase is copied with member-wise. Otherwise, the padding of base class may overwrite PrivateDerived::c2 when calling copy&lt;PrivateBase&gt;(derived, base).


class PublicDerived : public PublicBase {
public:
    char c2;
};

class PrivateDerived : public PrivateBase {
private:
    char c2;
};


int main() {
    std::cout &lt;&lt; &quot;sizeof(PublicBase)=&quot; &lt;&lt; sizeof(PublicBase) &lt;&lt; std::endl;
    std::cout &lt;&lt; &quot;sizeof(PublicDerived)=&quot; &lt;&lt; sizeof(PublicDerived) &lt;&lt; std::endl;
    std::cout &lt;&lt; &quot;sizeof(PrivateBase)=&quot; &lt;&lt; sizeof(PrivateBase) &lt;&lt; std::endl;
    std::cout &lt;&lt; &quot;sizeof(PrivateDerived)=&quot; &lt;&lt; sizeof(PrivateDerived) &lt;&lt; std::endl;

    return 0;
}
// Output:
// sizeof(PublicBase)=8
// sizeof(PublicDerived)=12
// sizeof(PrivateBase)=8
// sizeof(PrivateDerived)=8

I am confused about how the compiler decides to reuse padding of the base class or not.
According to the related question, the POD type doesn't reuse padding of the base class.

According to the cppreference-POD_class:
> A POD class is a class that
> - until C++11:
> - is an aggregate (no private or protected non-static data members),
> - has no user-declared copy assignment operator,
> - has no user-declared destructor, and
> - has no non-static data members of type non-POD class (or array of such types) or reference.
> - since C++11
> - is a trivial class,
> - is a standard-layout class (has the same access control for all non-static data members), and
> - has no non-static data members of type non-POD class (or array of such types).

before C++11, PrivateBase is not POD type (because it has private data members), but since C++11, it becomes POD type (because it has the same access control for all non-static data members).


int main() {
    std::cout &lt;&lt; &quot;PublicBase: is_standard_layout=&quot; &lt;&lt; is_standard_layout&lt;PublicBase&gt;::value
              &lt;&lt; &quot;, is_trivial=&quot; &lt;&lt; is_trivial&lt;PublicBase&gt;::value
              &lt;&lt; &quot;, is_pod=&quot; &lt;&lt; is_pod&lt;PublicBase&gt;::value &lt;&lt; std::endl;

    std::cout &lt;&lt; &quot;PrivateBase: is_standard_layout=&quot; &lt;&lt; is_standard_layout&lt;PrivateBase&gt;::value
              &lt;&lt; &quot;, is_trivial=&quot; &lt;&lt; is_trivial&lt;PrivateBase&gt;::value
              &lt;&lt; &quot;, is_pod=&quot; &lt;&lt; is_pod&lt;PrivateBase&gt;::value &lt;&lt; std::endl;
}
// Output:
// PublicBase: is_standard_layout=1, is_trivial=1, is_pod=1
// PrivateBase: is_standard_layout=1, is_trivial=1, is_pod=1

答案1

得分: 5

问题是,C++11中的默认复制赋值运算符何时使用位逐位复制而不是成员逐位复制?似乎is_trivially_copyable和is_pod都没有提供答案。

首先,对术语进行轻微的修正:您可能指的是_隐式定义的_复制赋值运算符。这与_隐式声明的_复制赋值运算符和_默认_或_显式默认_复制赋值运算符不同。

隐式定义的复制赋值运算符_始终_使用成员逐位复制,除了联合体,对于联合体,对象表示会被复制(即像memcpy一样按字节复制)。

然而,填充的值是未指定的,因此如果编译器知道它确实只是填充而没有被派生类成员重用,那么编译器不需要关心覆盖它。接下来,如果编译器知道赋值运算符等效于直接复制成员对象表示,例如,如果复制赋值运算符是平凡的,那么它可以将成员逐位复制替换为整个对象的对象表示的复制。这不会影响任何可观察的行为,因为唯一的区别,即结果填充值,无论如何都是未指定的。即使复制赋值不是平凡的,编译器在内联之后可能会看到,这种优化不会影响可观察的行为。只要可观察的行为不会更改为在抽象机器上不允许的行为("as-if"规则),就可以允许任何操作。

我对编译器如何决定是否重用基类填充感到困惑。根据相关问题,POD类型不会重用基类的填充。

这不是由标准规定的。由编译器决定在哪些情况下重用填充,这不需要与POD属性相一致。事实上,POD概念已被弃用,不再在当前标准版本中使用,除了弃用的is_pod类型特性。

更重要的是,标准规定每个_基类子对象都是_潜在重叠的。此属性用于定义是否允许通过memcpy复制平凡可复制对象,因为_每个_基类子对象都是_潜在重叠的_,所以标准理论上允许_任何_类的尾部填充被重用。显然,这将混淆C兼容性,因为类类型也是有效的C struct,所以编译器不会那么激进。

因为填充的重用会影响翻译单元之间的ABI兼容性,所以编译器通常会遵循一般规则,以保持翻译单元之间的二进制兼容性。通常,编译器/平台组合有一个ABI规范。

GCC和Clang遵循Itanium C++ ABI,该ABI规定了基于C++03标准的POD布局目的的概念,明确基于C++03标准的POD定义,除了一些特殊情况和一些澄清。这个概念,而不是C++标准的"POD"概念,用于决定Itanium C++ ABI中的尾部填充是否被重用。

在C++03中,PublicBase是POD,但PrivateBase不是,因此前者是_用于布局目的的POD_,而后者不是。因此,由于GCC和Clang,尾部填充仅用于后者。

当尾部填充可能被重用时,编译器无法为隐式复制赋值运算符复制整个对象表示,因为这可能会修改派生类成员的一个字节,正如您已经注意到的那样,这可能会影响可观察的行为,因此不包括在"as-if"之下。

英文:

> The question is, when does the default copy assignment operator from C++11 use bit-wise copy instead of member-wise copy? It seems that neither is_trivially_copyable nor is_pod provides the answer.

First minor correction on terminology: You probably mean the implicitly-defined copy assignment operator. This is different from implicitly-declared copy assignment operator and defaulted or explicitly-defaulted copy assignment operator.

The implicitly-defined copy assignment operator always uses member-wise copy, except for unions, for which the object representation is copied instead (i.e. byte-wise as if by memcpy).

However, the value of padding is unspecified, so that the compiler doesn't need to care about overwriting it if it knows that it is indeed only padding, i.e. not reused for derived classes members.

Then, if the compiler knows that the assignment operator is equivalent to copying the members' object representations directly, e.g. if the copy assignment operator is trivial, then it can replace the member-wise copy by a copy of the object representation of the whole object. This wouldn't affect any observable behavior since the only difference, the resulting padding values, are unspecified anyway. Even if the copy assignment is not trivial, the compiler might see e.g. after inlining that the observable behavior wouldn't be affected by this optimization. Anything is permitted as long as the observable behavior doesn't change to one that wasn't permitted on the abstract machine ("as-if" rule).

> I am confused about how the compiler decides to reuse padding of the base class or not. According to the related question, the POD type doesn't reuse padding of the base class.

This is not specified by the standard. It is up to the compiler to decide under which circumstances padding is reused and that does not need to coincide with the POD property. In fact the POD concept is deprecated and not used by current standard versions any more except for the deprecated is_pod type trait.

Even more so, the standard says that every base class subobject is potentially-overlapping. This property is used to define whether copying a trivially-copyable object by memcpy is permitted and because every base class subobject is potentially-overlapping, the standard, in theory, allows the tail padding of any class to be reused. Obviously this will however mess up C compatibility for class types that are also valid C structs, so a compiler isn't going to be that aggressive.

Because reuse of padding affects ABI compatibility between translation units, there will however be a general rule that the compiler will follow to maintain binary compatibility between translation units. Usually there is an ABI specification for the compiler/platform combination.

GCC and Clang follow the Itanium C++ ABI, which specifies the concept of POD for the purpose of layout which explicitly is based on the POD definition from the C++03 standard, excluding some special cases and with some clarifications. This concept, not the C++ standard's concept of "POD", is used to decided whether tail-padding is reused in the Itanium C++ ABI.

In C++03 PublicBase was POD, but PrivateBase wasn't, and so the former is POD for the purpose of layout, while the latter isn't. Consequently tail padding is reused only for the latter by GCC and Clang.

When tail padding is potentially reused the compiler can't copy the whole object representation for the implicit copy assignment operator because that would potentially modify a byte of a derived classes member as you already noticed, which would potentially affect the observable behavior and therefore would not be covered under "as-if".

huangapple
  • 本文由 发表于 2023年6月25日 21:16:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76550587.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定