英文:
overhead for moving std::shared_ptr?
问题
这是一个C++代码片段。Func1
生成一个共享对象,该对象直接移动到Func2
中。我们认为Func3
中不应该有额外的开销。将这个代码片段放入Compiler Explorer中,我们发现与clang或GCC相比,使用MSVC可以得到2-3倍更短的代码。为什么会这样,是否可以在clang/GCC中获得更短的代码?
看起来Func3
生成了用于清理临时共享对象的异常处理代码。
#include <memory>
std::shared_ptr<double> Func1();
void Func2(std::shared_ptr<double> s);
void Func3()
{
Func2(Func1());
}
英文:
Here is a C++ snippet. Func1
generates a shared object, which is directly moved into Func2
. We think that there should not be overhead in Func3
. Putting this snippet into Compiler Explorer, we see a 2-3 times shorter code with MSVC compared to clang or GCC. Why is that, and can one obtain the shorter code with clang/GCC?
It looks like Func3
generates exception handling code for cleaning up the temporary shared object.
#include <memory>
std::shared_ptr<double> Func1();
void Func2 (std::shared_ptr<double> s);
void Func3()
{
Func2(Func1());
}
答案1
得分: 6
问题归结为平台ABI,并且可以通过一个完全不透明的类型更好地说明:
struct A {
A(const A&);
A(A&&);
~A();
};
A make() noexcept;
void take(A) noexcept;
void foo() {
take(make());
}
MSVC输出
void foo(void) PROC
push ecx
push ecx
push esp
call A make(void)
add esp, 4
call void take(A)
add esp, 8
ret 0
void foo(void) ENDP
GCC输出(clang非常相似)
foo():
sub rsp, 24
lea rdi, [rsp+15]
call make()
lea rdi, [rsp+15]
call take(A)
lea rdi, [rsp+15]
call A::~A() [complete object destructor]
add rsp, 24
ret
如果类型具有非平凡的析构函数,则在控制返回给调用者后,调用者会调用该析构函数(包括调用者抛出异常时)。
解释
这里发生的情况是:
make()
产生了一个类型为A
的 prvalue- 这个 prvalue 被传递给
take(A)
的参数- 强制进行了复制省略,因此没有调用复制/移动构造函数
- 只有 GCC 和 clang 在调用点处销毁
A
相反,MSVC在被调用者内部销毁临时的 A
(或在您的情况下,std::shared_ptr
)。您看到的额外代码是 std::shared_ptr
析构函数的内联版本。
最终,您不应该看到任何重大的性能影响。然而,如果 Func2
重置/释放了共享指针,那么调用点处的大部分析构代码将是无效的。这个ABI问题类似于 std::unique_ptr
的一个问题:
还存在一个关于函数参数的析构顺序和执行
unique_ptr
析构函数的语言问题。为了简单起见,本文忽略了这个问题,但是一个完整的“unique_ptr
与传递T*
一样廉价”的解决方案也必须解决这个问题。
另请参阅
Agner Fog. - Calling conventions for different C++ compilers and operating systems
英文:
The problem boils down to platform ABI, and is better illustrated by a completely opaque type:
struct A {
A(const A&);
A(A&&);
~A();
};
A make() noexcept;
void take(A) noexcept;
void foo() {
take(make());
}
See comparison at Compiler Explorer
MSVC Output
void foo(void) PROC
push ecx
push ecx
push esp
call A make(void)
add esp, 4
call void take(A)
add esp, 8
ret 0
void foo(void) ENDP
GCC Output (clang is very similar)
foo():
sub rsp, 24
lea rdi, [rsp+15]
call make()
lea rdi, [rsp+15]
call take(A)
lea rdi, [rsp+15]
call A::~A() [complete object destructor]
add rsp, 24
ret
> If the type has a non-trivial destructor, the caller calls that destructor after control returns to it (including when the caller throws an exception).
- Itanium C++ ABI §3.1.2.3 Non-Trivial Parameters
Explanation
What takes place here is:
make()
yields a prvalue of typeA
- this is fed into the parameter of
take(A)
- mandatory copy elision takes place, so there is no call to copy/move constructors
- only GCC and clang destroy
A
at the call site
MSVC instead destroys the temporary A
(or in your case, std::shared_ptr
) inside the callee, not at the call site. The extra code you're seeing is an inlined version of the std::shared_ptr
destructor.
In the end, you shouldn't see any major performance impact as a result. However, if Func2
resets/releases the shared pointer, then most of the destructor code at the call site is dead, unfortunately. This ABI problem is similar to an issue with std::unique_ptr
:
> There is also a language issue surrounding the order of destruction of function parameters and the
execution of unique_ptr
's destructor. For simplicity that is being ignored in this paper, but a complete solution to "unique_ptr
is as cheap to pass a T*
" would have to address that as well.
See Also
Agner Fog. - Calling conventions for different C++ compilers and operating systems
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论