移动`std::shared_ptr`的开销是多少?

huangapple go评论84阅读模式
英文:

overhead for moving std::shared_ptr?

问题

这是一个C++代码片段。Func1生成一个共享对象,该对象直接移动到Func2中。我们认为Func3中不应该有额外的开销。将这个代码片段放入Compiler Explorer中,我们发现与clang或GCC相比,使用MSVC可以得到2-3倍更短的代码。为什么会这样,是否可以在clang/GCC中获得更短的代码?

看起来Func3生成了用于清理临时共享对象的异常处理代码。

#include <memory>

std::shared_ptr<double> Func1();
void Func2(std::shared_ptr<double> s);

void Func3()
{
  Func2(Func1());
}
英文:

Here is a C++ snippet. Func1 generates a shared object, which is directly moved into Func2. We think that there should not be overhead in Func3. Putting this snippet into Compiler Explorer, we see a 2-3 times shorter code with MSVC compared to clang or GCC. Why is that, and can one obtain the shorter code with clang/GCC?

It looks like Func3 generates exception handling code for cleaning up the temporary shared object.

#include &lt;memory&gt;

std::shared_ptr&lt;double&gt; Func1();
void Func2 (std::shared_ptr&lt;double&gt; s);

void Func3()
{
  Func2(Func1());
}

答案1

得分: 6

问题归结为平台ABI,并且可以通过一个完全不透明的类型更好地说明:

struct A {
    A(const A&);
    A(A&&);
    ~A();
};

A make() noexcept;
void take(A) noexcept;

void foo() {
    take(make());
}

MSVC输出

void foo(void) PROC
        push    ecx
        push    ecx
        push    esp
        call    A make(void)
        add     esp, 4
        call    void take(A)
        add     esp, 8
        ret     0
void foo(void) ENDP

GCC输出(clang非常相似)

foo():
        sub     rsp, 24
        lea     rdi, [rsp+15]
        call    make()
        lea     rdi, [rsp+15]
        call    take(A)
        lea     rdi, [rsp+15]
        call    A::~A() [complete object destructor]
        add     rsp, 24
        ret

如果类型具有非平凡的析构函数,则在控制返回给调用者后,调用者会调用该析构函数(包括调用者抛出异常时)。

解释

这里发生的情况是:

  • make() 产生了一个类型为 A 的 prvalue
  • 这个 prvalue 被传递给 take(A) 的参数
    • 强制进行了复制省略,因此没有调用复制/移动构造函数
  • 只有 GCC 和 clang 在调用点处销毁 A

相反,MSVC在被调用者内部销毁临时的 A(或在您的情况下,std::shared_ptr)。您看到的额外代码是 std::shared_ptr 析构函数的内联版本。

最终,您不应该看到任何重大的性能影响。然而,如果 Func2 重置/释放了共享指针,那么调用点处的大部分析构代码将是无效的。这个ABI问题类似于 std::unique_ptr 的一个问题:

还存在一个关于函数参数的析构顺序和执行 unique_ptr 析构函数的语言问题。为了简单起见,本文忽略了这个问题,但是一个完整的“unique_ptr 与传递 T* 一样廉价”的解决方案也必须解决这个问题。


另请参阅

Agner Fog. - Calling conventions for different C++ compilers and operating systems

英文:

The problem boils down to platform ABI, and is better illustrated by a completely opaque type:

struct A {
    A(const A&amp;);
    A(A&amp;&amp;);
    ~A();
};

A make() noexcept;
void take(A) noexcept;

void foo() {
    take(make());
}

See comparison at Compiler Explorer

MSVC Output

void foo(void) PROC
        push    ecx
        push    ecx
        push    esp
        call    A make(void)
        add     esp, 4
        call    void take(A)
        add     esp, 8
        ret     0
void foo(void) ENDP

GCC Output (clang is very similar)

foo():
        sub     rsp, 24
        lea     rdi, [rsp+15]
        call    make()
        lea     rdi, [rsp+15]
        call    take(A)
        lea     rdi, [rsp+15]
        call    A::~A() [complete object destructor]
        add     rsp, 24
        ret

> If the type has a non-trivial destructor, the caller calls that destructor after control returns to it (including when the caller throws an exception).

- Itanium C++ ABI §3.1.2.3 Non-Trivial Parameters

Explanation

What takes place here is:

  • make() yields a prvalue of type A
  • this is fed into the parameter of take(A)
    • mandatory copy elision takes place, so there is no call to copy/move constructors
  • only GCC and clang destroy A at the call site

MSVC instead destroys the temporary A (or in your case, std::shared_ptr) inside the callee, not at the call site. The extra code you're seeing is an inlined version of the std::shared_ptr destructor.

In the end, you shouldn't see any major performance impact as a result. However, if Func2 resets/releases the shared pointer, then most of the destructor code at the call site is dead, unfortunately. This ABI problem is similar to an issue with std::unique_ptr:

> There is also a language issue surrounding the order of destruction of function parameters and the
execution of unique_ptr's destructor. For simplicity that is being ignored in this paper, but a complete solution to "unique_ptr is as cheap to pass a T*" would have to address that as well.


See Also

Agner Fog. - Calling conventions for different C++ compilers and operating systems

huangapple
  • 本文由 发表于 2023年8月8日 20:24:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76859542.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定