为什么将字符串初始化为“”””比使用默认构造函数更高效?

huangapple go评论84阅读模式
英文:

Why is initializing a string to "" more efficient than the default constructor?

问题

通常情况下,默认构造函数应该是创建一个空容器最快的方式。这就是为什么我感到惊讶,因为它比初始化为空字符串文字更糟糕:

这段代码编译为(clang 16, libc++):

make_default():
        mov     rax, rdi
        xorps   xmm0, xmm0
        movups  xmmword ptr [rdi], xmm0
        mov     qword ptr [rdi + 16], 0
        ret
make_empty():
        mov     rax, rdi
        mov     word ptr [rdi], 0
        ret

注意,返回{}会将总共的24个字节清零,而返回""只清零了2个字节。为什么return ""好得多呢?

英文:

Generally, the default constructor should be the fastest way of making an empty container.
That's why I was surprised to see that it's worse than initializing to an empty string literal:

#include <string>

std::string make_default() {
    return {};
}

std::string make_empty() {
    return "";
}

This compiles to: (clang 16, libc++)

make_default():
        mov     rax, rdi
        xorps   xmm0, xmm0
        movups  xmmword ptr [rdi], xmm0
        mov     qword ptr [rdi + 16], 0
        ret
make_empty():
        mov     rax, rdi
        mov     word ptr [rdi], 0
        ret

See live example at Compiler Explorer.

Notice how returning {} is zeroing 24 bytes in total, but returning "" is only zeroing 2 bytes. How come return ""; is so much better?

答案1

得分: 49

这是libc++对std::string的实现中的一个有意的决定。

首先,std::string拥有所谓的小字符串优化 (SSO),这意味着对于非常短(或空)的字符串,它会将它们的内容直接存储在容器内部,而不是分配动态内存。这就是为什么在任何情况下都不会看到任何分配的原因。

在libc++中,std::string的“短表示”包括:

大小 (x86_64) 意义
1位 “短标志”,表示它是一个短字符串(零表示是)
7位 字符串的长度,不包括空终止符
0字节 用于对齐字符串数据的填充字节(对于basic_string<char>来说没有)
23字节 字符串数据,包括空终止符

对于空字符串,我们只需要存储两个字节的信息:

  • 一个零字节用于“短标志”和长度
  • 一个零字节用于空终止符

接受const char*的构造函数只会写入这两个字节,这是最低限度的。
默认构造函数“不必要地”将std::string包含的所有24字节都置零。
尽管如此,这可能在整体上更好,因为它使编译器能够发出std::memset或其他SIMD并行方式来批量清零字符串数组。

要获取完整的解释,请参见下面的内容:

初始化为"" / 调用string(const char*)

要了解发生了什么,让我们看一下libc++中std::basic_string的源代码

// 约束...
/* 限定符... */ basic_string(const _CharT* __s)
  : /* 让内存不确定 */ {
    // 断言 __s 不为空指针
    __init(__s, traits_type::length(__s));
    // ...
  }

这最终会调用__init(__s, 0),其中0是字符串的长度,从std::char_traits<char>获得:

// 模板头等等...
void basic_string</* ... */>::__init(const value_type* __s, size_type __sz)
{
    // 长度和constexpr检查
    pointer __p;
    if (__fits_in_sso(__sz))
    {
        __set_short_size(__sz); // 将大小设置为零,第一个字节
        __p = __get_short_pointer();
    }
    else
    {
        // 不进入
    }
    traits_type::copy(std::__to_address(__p), __s, __sz); // 复制字符串,什么也不会发生
    traits_type::assign(__p[__sz], value_type()); // 添加空终止符
}

__set_short_size最终只会写入一个字节,因为字符串的短表示如下:

struct __short
{
    struct _LIBCPP_PACKED {
        unsigned char __is_long_ : 1; // 当活动时设置为零
        unsigned char __size_ : 7;    // 为空字符串时设置为零
    };
    char __padding_[sizeof(value_type) - 1]; // 零大小数组
    value_type __data_[__min_cap]; // 空终止符放在这里
};

在编译器优化后,清零__is_long___size_以及__data_的一个字节会编译为:

mov word ptr [rdi], 0

初始化为{} / 调用string()

相比之下,默认构造函数更加浪费:

/* 限定符... */ basic_string() /* noexcept(...) */
  : /* 让内存不确定 */ {
    // ...
    __default_init();
}

这最终会调用__default_init(),其中执行以下操作:

/* 限定符... */ void __default_init() {
    __r_.first() = __rep(); // 将表示设置为值初始化的__rep
    // constexpr-only stuff...
}

__rep()的值初始化会导致24个零字节,因为:

struct __rep {
    union {
        __long  __l; // 第一个联合成员会被初始化,
        __short __s; // __long表示占用24字节
        __raw   __r;
    };
};

结论

如果出于一致性的考虑希望在所有地方进行值初始化,那么不要让这个问题困扰你。不必要地清零几个字节不是你需要担心的性能问题。

事实上,在初始化大量字符串时,这是有帮助的,因为可能会使用std::memset或其他一些SIMD方式来批量清零内存。

英文:

This is an intentional decision in libc++'s implementation of std::string.

First of all, std::string has so-called Small String Optimization (SSO), which means that for very short (or empty) strings, it will store their contents directly inside of the container, rather than allocating dynamic memory.
That's why we don't see any allocations in either case.

In libc++, the "short representation" of a std::string consists of:

Size (x86_64) Meaning
1 bit "short flag" indicating that it is a short string (zero means yes)
7 bits length of the string, excluding null terminator
0 bytes padding bytes to align string data (none for basic_string&lt;char&gt;)
23 bytes string data, including null terminator

For an empty string, we only need to store two bytes of information:

  • one zero-byte for the "short flag" and the length
  • one zero-byte for the null terminator

The constructor accepting a const char* will only write these two bytes, the bare minimum.
The default constructor "unnecessarily" zeroes all 24 bytes that the std::string contains.
This may be better overall though, because it makes it possible for the compiler to emit std::memset or other SIMD-parallel ways of zeroing arrays of strings in bulk.

For a full explanation, see below:

Initializing to &quot;&quot; / Calling string(const char*)

To understand what happens, let's look at the libc++ source code for std::basic_string:

// constraints...
/* specifiers... */ basic_string(const _CharT* __s)
  : /* leave memory indeterminate */ {
    // assert that __s != nullptr
    __init(__s, traits_type::length(__s));
    // ...
  }

This ends up calling __init(__s, 0), where 0 is the length of the string, obtained from std::char_traits&lt;char&gt;:

// template head etc...
void basic_string&lt;/* ... */&gt;::__init(const value_type* __s, size_type __sz)
{
    // length and constexpr checks
    pointer __p;
    if (__fits_in_sso(__sz))
    {
        __set_short_size(__sz); // set size to zero, first byte
        __p = __get_short_pointer();
    }
    else
    {
        // not entered
    }
    traits_type::copy(std::__to_address(__p), __s, __sz); // copy string, nothing happens
    traits_type::assign(__p[__sz], value_type()); // add null terminator
}

__set_short_size will end up writing only a single byte, because the short representation of a string is:

struct __short
{
    struct _LIBCPP_PACKED {
        unsigned char __is_long_ : 1; // set to zero when active
        unsigned char __size_ : 7;    // set to zero for empty string
    };
    char __padding_[sizeof(value_type) - 1]; // zero size array
    value_type __data_[__min_cap]; // null terminator goes here
};

After compiler optimizations, zeroing __is_long_, __size_, and one byte of __data_ compiles to:

mov word ptr [rdi], 0

Initializing to {} / Calling string()

The default constructor is more wasteful by comparison:

/* specifiers... */ basic_string() /* noexcept(...) */
  : /* leave memory indeterminate */ {
    // ...
    __default_init();
}

This ends up calling __default_init(), which does:

/* specifiers... */ void __default_init() {
    __r_.first() = __rep(); // set representation to value-initialized __rep
    // constexpr-only stuff...
}

Value-initialization of a __rep() results in 24 zero bytes, because:

struct __rep {
    union {
        __long  __l; // first union member gets initialized,
        __short __s; // __long representation is 24 bytes large
        __raw   __r;
    };
};

Conclusions

If you want to value-initialize everywhere for the sake of consistency, don't let this keep you from it. Zeroing out a few bytes unnecessarily isn't a big performance problem you need to worry about.

In fact, it is helpful when initializing large quantities of strings, because std::memset may be used, or some other SIMD way of zeroing out memory.

huangapple
  • 本文由 发表于 2023年6月26日 07:40:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76552816.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定