为什么将容器的元素分配给容器本身(不)是一个明确定义的C++行为?

huangapple go评论100阅读模式
英文:

Why is assigning a container's element to the container (not) a well-defined C++?

问题

在C++中存在着一个臭名昭著的自赋值问题:在实现operator=(const T &other)时,必须小心处理this == &other的情况,以免在从other复制数据之前破坏this的数据。

然而,*thisother之间可能以比相同对象更有趣的方式进行交互。也就是说,一个可能包含另一个。请考虑以下代码:

#include <iostream>
#include <string>
#include <utility>
#include <vector>
struct Foo {
    std::string s = "hello world very long string";
    std::vector<Foo> children;
};
int main() {
    std::vector<Foo> f(4);
    f[0].children.resize(2);
    f = f[0].children;  // (1)
    // auto tmp = f[0].children; f = std::move(tmp);  // (2)
    std::cout << f.size() << "\n";
}

我期望行(1)(2)是相同的:程序应该定义良好,打印出2。然而,我还没有找到一个启用了Address Sanitizer的编译器+标准库组合,能够正常运行行(1)而不崩溃:GCC+stdlibc++,Clang+libc++和Visual Studio+Microsoft STL都会崩溃。

有趣的是,禁用Address Sanitizer会消除崩溃,程序开始打印2

为什么标准C++中禁止或允许这种操作?

额外问题:相同的情况,但使用f[0].children = f。额外额外问题:使用std::any代替std::vector<Foo>

英文:

In C++ there is the infamous problem of self-assignment: when implementing operator=(const T &amp;other), one has to be careful of the this == &amp;other case to not destroy this's data before copying it from other.

However, *this and other may interact in more interesting ways than being the same object. Namely, one may contain the other. Consider the following code:

#include &lt;iostream&gt;
#include &lt;string&gt;
#include &lt;utility&gt;
#include &lt;vector&gt;
struct Foo {
    std::string s = &quot;hello world very long string&quot;;
    std::vector&lt;Foo&gt; children;
};
int main() {
    std::vector&lt;Foo&gt; f(4);
    f[0].children.resize(2);
    f = f[0].children;  // (1)
    // auto tmp = f[0].children; f = std::move(tmp);  // (2)
    std::cout &lt;&lt; f.size() &lt;&lt; &quot;\n&quot;;
}

I'd expect that lines (1) and (2) are identical: program is well-defined to print 2. However, I'm yet to find a compiler+standard library combination that works with line (1) and Address Sanitizer enabled: GCC+stdlibc++, Clang+libc++ and Visual Studio+Microsoft STL all crash.

Curiously, disabling Address Sanitizer removes the crash and the program starts printing 2.

Why is this operation prohibited or permitted in the standard C++?

Extra question: same, but with f[0].children = f. Extra-extra question: use std::any instead of std::vector&lt;Foo&gt;.

答案1

得分: 5

我不确定 (1) 是否定义良好,因为为了将新值复制到 f[0] 中,必须首先销毁该位置上原来的对象,或者至少在被视为 const 的情况下进行修改。

根据 std::vector<T,Allocator>::operator=我强调):

> 如果赋值后 *this 的分配器与其旧值不相等,则使用旧分配器来释放内存,然后使用新分配器在复制元素之前重新分配内存。否则,当可能时,*this 拥有的内存可能会被重用。无论如何,最初属于 *this 的元素可能会被销毁或替换为逐个元素的复制赋值。

因此,在上述所有情况下,可以预期在复制之前对象可能会被销毁,这将导致行为未定义或特定于实现的领域。

从实际角度来看,为了使向量重新使用该内存,通常需要进行 placement-delete,然后进行 placement-new,在这些情况下,被复制的引用对象也将在此过程中被销毁。

即使在最宽松的情况下(即“逐个元素的复制赋值”),你从 f[0] 上调用 Foo::operator=(const Foo&amp;) 来用 f[0].children[0] 的副本替换它。向量 f[0].children[0].children 是空的,因此复制将导致 f[0].children 的两个元素都被销毁,但目标向量的容量(为2)保持不变。甚至在处理下一个元素之前,最初被复制的 const Foo&amp; 已经被修改,违反了其约定,一切皆不确定。

我认为没有任何自动的方法可以在不使用某种自定义垃圾回收分配器的情况下防止这种情况。你只需要意识到自引用问题并避免它。你通过引入一个副本来解决了 (2) 中的问题,而且这至少是定义良好的。可以通过先将数据移出容器来进一步处理:

auto tmp = std::move(f[0].children);
f = std::move(tmp);

也许可以通过仔细应用 std::shared_ptr 来更普遍地解决这个问题,因为你的主要问题是销毁你期望仍然被引用的数据。

我认为关于违反常量对象约定的问题是回答你关于 f[0].children = f 的“额外”问题的关键,而不需要深入细节。在这种情况下,由于所需容量的增加,children 可能会重新分配,并且在这样做时修改了本应为常量的 f

英文:

I'm not convinced that (1) is well-defined, because in order to copy a new value into f[0], the old object residing at that location must first be destroyed, or is at the very least modified while under the contract of being const.

From std::vector<T,Allocator>::operator= (emphasis mine):

> If the allocator of *this after assignment would compare unequal to its old value, the old allocator is used to deallocate the memory, then the new allocator is used to allocate it before copying the elements. Otherwise, the memory owned by *this may be reused when possible. In any case, the elements originally belonging to *this may be either destroyed or replaced by element-wise copy-assignment.

So it would be expected that in all scenarios above, it's possible the object is destroyed before it's be copied, and you fall into the territory of behavior that is either undefined or specific to an implementation.

In practical terms, for the vector to re-use this memory it generally necessitates placement-delete followed by placement-new and in these cases once again the referenced object being copied is destroyed in the process.

Even in the most lenient scenario (i.e. "replaced by element-wise copy-assignment") you begin with Foo::operator=(const Foo&amp;) invoked on f[0] to replace it with a copy of f[0].children[0]. The vector f[0].children[0].children is empty, and so the copy will result in both elements of f[0].children being destroyed but leaving the target vector's capacity (which is 2) unchanged. Before even getting to the next element, the const Foo&amp; that was originally being copied has been modified, breaking its contract and all bets are off.

I don't think there's any automatic way to protect against that without maybe using some kind of custom garbage-collecting allocator. You simply need to recognize the self-referential problem and avoid it. You worked around the problem in (2) by introducing a copy, and that is at least well-defined. It can be taken one step further by moving the data out of the container first:

auto tmp = std::move(f[0].children);
f = std::move(tmp);

Perhaps the problem can be more generally worked around with careful application of std::shared_ptr, since your main issue is the destruction of data that you expected is still referenced.

I think the whole contract-breaking-of-const-object stuff is really the key to answering your "extra" question about f[0].children = f without getting too deep in details. In this case, children may be reallocated due to the required increase in capacity, and in doing so modifies f which was supposed to be const.

huangapple
  • 本文由 发表于 2023年8月9日 08:30:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76863864.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定