GCC无法通过别名优化构造函数调用。

huangapple go评论63阅读模式
英文:

Why can't GCC optimize constructor call via alias?

问题

此问题的启发,

有了这段代码

#include <string>

template<class T>
struct A {
    template <typename U> using NewA = A<U>;
    constexpr A(T const& t){}
    constexpr auto f() const {
        return NewA{"bye"};
    }
};

A(const char*) -> A<std::string>;

int main() {
    A{"hello"}.f();
}

GCC 13.1 生成了大量无用的代码(尤其是调用 std::string 构造函数/析构函数以及其他一些内容)

main:
        sub     rsp, 72
        mov     edx, OFFSET FLAT:.LC1+5
        mov     esi, OFFSET FLAT:.LC1
        lea     rax, [rsp+16]
        mov     rdi, rsp
        mov     QWORD PTR [rsp], rax
        call    void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>(char const*, char const*, std::forward_iterator_tag) [clone .isra.0]
        lea     rax, [rsp+48]
        mov     edx, OFFSET FLAT:.LC2+3
        mov     esi, OFFSET FLAT:.LC2
        lea     rdi, [rsp+32]
        mov     QWORD PTR [rsp+32], rax
        call    void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>(char const*, char const*, std::forward_iterator_tag) [clone .isra.0]
        lea     rdi, [rsp+32]
        call    std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_dispose()
        mov     rdi, rsp
        call    std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_dispose()
        xor     eax, eax
        add     rsp, 72
        ret

如果我将这行 return NewA{"bye"}; 替换为 return ::A{"bye"};(在我的观点中应该完全相同)

#include <string>

template<class T>
struct A {
    template <typename U> using NewA = A<U>;
    constexpr A(T const& t){}
    constexpr auto f() const {
        return ::A{"bye"};
    }
};

A(const char*) -> A<std::string>;

int main() {
    A{"hello"}.f();
}

编译器能够将所有内容优化为一个 XOR

main:
        xor     eax, eax
        ret

示例

这是否是某种“早期版本的错误”?Clang 甚至不能编译此代码(不支持通过别名进行的 CTAD)。

更新:
看起来至少 GCC 10.1 可以完美优化 所有内容。

英文:

A bit inspired by this question

Having this code

#include &lt;string&gt;

template&lt;class T&gt;
struct A {
    template &lt;typename U&gt; using NewA = A&lt;U&gt;;
    constexpr A(T const&amp; t){}
    constexpr auto f() const {
        return NewA{&quot;bye&quot;};
    }
};

A(const char*) -&gt; A&lt;std::string&gt;;

int main() {
    A{&quot;hello&quot;}.f();
}

GCC 13.1 generates a lot of useless code (call std::string constructor/destructor most notably and some other stuff)

main:
        sub     rsp, 72
        mov     edx, OFFSET FLAT:.LC1+5
        mov     esi, OFFSET FLAT:.LC1
        lea     rax, [rsp+16]
        mov     rdi, rsp
        mov     QWORD PTR [rsp], rax
        call    void std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;::_M_construct&lt;char const*&gt;(char const*, char const*, std::forward_iterator_tag) [clone .isra.0]
        lea     rax, [rsp+48]
        mov     edx, OFFSET FLAT:.LC2+3
        mov     esi, OFFSET FLAT:.LC2
        lea     rdi, [rsp+32]
        mov     QWORD PTR [rsp+32], rax
        call    void std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;::_M_construct&lt;char const*&gt;(char const*, char const*, std::forward_iterator_tag) [clone .isra.0]
        lea     rdi, [rsp+32]
        call    std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;::_M_dispose()
        mov     rdi, rsp
        call    std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;::_M_dispose()
        xor     eax, eax
        add     rsp, 72
        ret

If I replace this line return NewA{&quot;bye&quot;}; with return ::A{&quot;bye&quot;}; (which is suppose to be exactly the same from my opinion)

#include &lt;string&gt;

template&lt;class T&gt;
struct A {
    template &lt;typename U&gt; using NewA = A&lt;U&gt;;
    constexpr A(T const&amp; t){}
    constexpr auto f() const {
        return ::A{&quot;bye&quot;};
    }
};

A(const char*) -&gt; A&lt;std::string&gt;;

int main() {
    A{&quot;hello&quot;}.f();
}

the compiler is able to optimize everything into one xor

main:
        xor     eax, eax
        ret

Example

Is that some kind of "early version bug"? Clang can't even compile this code yet (doesn't support CTAD via alias).

UPD:
Looks like at least GCC 10.1 can optimize everything perfectly

答案1

得分: 4

你的两个变体 return ::A{&quot;bye&quot;};return NewA{&quot;bye&quot;}; 都会使你的程序成为IFNDR(不合法,无需诊断)。(底部有注释)

对于CTAD考虑的推导指南,是从CTAD执行的实例化上下文中可达的那些。

因此,在由A{&quot;hello&quot;}.f();引起的隐式实例化中,总是考虑你的用户声明的推导指南。

然而,NewA{&quot;bye&quot;}::A{&quot;bye&quot;} 都是非相关的。在模板定义中,非相关构造的一个额外要求是,在紧随模板定义之后的假设实例化中,其解释与模板特化的任何实际实例化中的解释不同。如果不满足这一要求,程序会成为IFNDR。(见[temp.res.general]/6.6

在你的情况下,在定义A之后的假设实例化中,用户声明的推导指南是 不可达 的,因此它将推断 A&lt;char[4]&gt; 而不是 A&lt;std::string&gt;,这与实际实例化中的解释不同。

这个规则允许编译器在定义的地方立即执行所有非相关构造的推导、重载决议等,我想这可能是GCC在 ::A{&quot;bye&quot;} 变体中所做的(显然是非相关的),但不适用于 NewA{&quot;bye&quot;} 变体(这不太清楚,参见本答案结尾)。选择 A&lt;char[4]&gt; 而不是 A&lt;std::string&gt; 会减少一个要构造的 std::string 临时对象,这对编译器来说可能会使优化变得简单得多。

要优化掉一切,它需要决定内联所有 std::string 构造函数/析构函数调用以及它们调用的一切,然后,除非适用SSO,否则必须识别匹配的 operator new/operator delete 调用,以便将它们替换为堆栈内存。(通常来说,调用这些分配函数是可观察的,因为你可以在程序中的任何地方替换它们,甚至可以通过动态链接在运行时替换。但如果可以提供堆栈内存,编译器允许匹配这些调用并将它们都省略掉。)


编辑:最初我声称 NewA{&quot;bye&quot;} 是相关的,但再次思考,::A{&quot;bye&quot;}NewA{&quot;bye&quot;} 似乎都是非相关的,所以可能两者都是IFNDR。似乎目前尚未明确规定推导类类型的占位符的相关性,请参见CWG问题2600,但根据提议的解决方案,你的两个变体确实都将是非相关的,因此是IFNDR。

英文:

Both your two variants return ::A{&quot;bye&quot;}; and return NewA{&quot;bye&quot;}; make your program IFNDR (ill-formed, no diagnostic required). (see note at the bottom though)

The deduction guides considered for CTAD are those reachable from the instantiation context in which CTAD is performed.

So in the implicit instantiation caused by A{&quot;hello&quot;}.f(); your user-declared deduction guide is always considered.

However, both NewA{&quot;bye&quot;} and ::A{&quot;bye&quot;} are non-dependent. For a non-dependent construct in a template definition there is an additional requirement that its interpretation in a hypothetical instantiation immediately following the template definition is not different than its interpretation in any actual instantiation of the template specialization. If this is not satisfied, the program is IFNDR. (see[temp.res.general]/6.6).

In your case the hypothetical instantiation immediately after the definition of A the user-declared deduction guide is not reachable and therefore it would deduce A&lt;char[4]&gt; instead of A&lt;std::string&gt;, a different interpretation than in the actual instantiation.

The rule permits the compiler to do all of the deduction, overload resolution, etc. of non-dependent constructs immediately where they are defined, which I guess is what GCC is doing here for the ::A{&quot;bye&quot;} variant (which is clearly non-dependent), but not for the NewA{&quot;bye&quot;} variant (which is a bit less clear, see end of this answer). Choosing A&lt;char[4]&gt; instead of A&lt;std::string&gt; there is one less std::string temporary to construct and that probably makes the optimization for the compiler much simpler.

To optimize everything away, it needs to decide to inline all std::string constructor/destructor calls and everything they call and must then, except if SSO applies, recognize matching operator new/operator delete calls in order to replace them by stack memory. (Generally a call to these allocation functions is observable because you can replace them anywhere in the program, even at runtime via dynamic linking. But the compiler is allowed to match these calls and omit them both if it can provide e.g. stack memory instead.)


EDIT: Originally I claimed that NewA{&quot;bye&quot;} was dependent, but thinking about it again, both ::A{&quot;bye&quot;} and NewA{&quot;bye&quot;} seem non-dependent, so that probably both are IFNDR. It seems that dependence of placeholders for deduced class types currently isn't clearly specified, see CWG issue 2600, but with the proposed resolution both your variants would indeed be non-dependent and therefore IFNDR.

huangapple
  • 本文由 发表于 2023年7月24日 18:35:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76753631.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定