
huangapple go评论88阅读模式

Storage of String Literals in memory c++






I read that string literals are always stored in read only memory and it makes sense as to why.

However if I initialize a character array using a string literal, it still stores the string literal in read only memory and then copies it into the memory location of the character array.

My question is, in this scenario, why bother storing the string literal in read only memory in the first place, why not directly store it in the memory location of character array.


得分: 2




示例 1

  1. int sum() {
  2. char arr[] = "ab";
  3. return arr[0] + arr[1];
  4. }


  1. sum():
  2. mov eax, 195
  3. ret

在这种情况下,因为一切都是编译时常量,根本没有字符串字面值或数组。编译器进行了优化,通过对ASCII字符ab求和,将我们的代码转换为return 195;

示例 2

  1. void consume(const char*);
  2. void short_string() {
  3. char arr[] = "short str";
  4. consume(arr);
  5. }
  1. short_string():
  2. sub rsp, 24
  3. movabs rax, 8391086215229565043
  4. mov qword ptr [rsp + 8], rax
  5. mov word ptr [rsp + 16], 114
  6. lea rdi, [rsp + 8]
  7. call consume(char const*)@PLT
  8. add rsp, 24
  9. ret

再次,没有生成任何代码来将字符串存储在只读内存中,但它也没有完全被优化掉。编译器看到字符串short str非常短,因此将其ASCII字节视为数字8391086215229565043,并直接将其内存移到堆栈上。consume()以指向堆栈内存的指针调用。

示例 3

  1. void long_string() {
  2. char arr[] = "Lorem ipsum dolor [...] est laborum.";
  3. consume(arr);
  4. }
  1. long_string():
  2. push rbx
  3. sub rsp, 448
  4. lea rsi, [rip + .L__const.long_string().arr]
  5. mov rbx, rsp
  6. mov edx, 446
  7. mov rdi, rbx
  8. call memcpy@PLT
  9. mov rdi, rbx
  10. call consume(char const*)@PLT
  11. add rsp, 448
  12. pop rbx
  13. ret
  14. .L__const.long_string().arr:
  15. .asciz "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."




请参阅在Compiler Explorer上的实际示例


> I read that string literals are always stored in read only memory and it makes sense as to why.

The storage location of string literals is implementation-defined. If compilers decide to emit a large string literal, it will usually be located in a read-only section of static memory, such as .rodata.

However, whether this is even necessary is up to the compiler. Compilers are allowed to optimize your code according to the as-if rule, so if the behavior of the program is the same with the literal being stored elsewhere, or nowhere at all, that is also allowed.

Example 1

  1. int sum() {
  2. char arr[] = "ab";
  3. return arr[0] + arr[1];
  4. }

With the following assembly output:

  1. sum():
  2. mov eax, 195
  3. ret

In this case, because everything is a compile-time constant, there is no string literal or array at all. The compiler optimized it away and turned our code into return 195; by summing up the two ASCII characters a and b.

Example 2

  1. void consume(const char*);
  2. void short_string() {
  3. char arr[] = "short str";
  4. consume(arr);
  5. }
  1. short_string():
  2. sub rsp, 24
  3. movabs rax, 8391086215229565043
  4. mov qword ptr [rsp + 8], rax
  5. mov word ptr [rsp + 16], 114
  6. lea rdi, [rsp + 8]
  7. call consume(char const*)@PLT
  8. add rsp, 24
  9. ret

Once again, no code was emitted that would keep the string in read-only memory, but it also wasn't away optimized completely. The compiler sees that the string short str is very short, so it treats its ASCII bytes as a number 8391086215229565043 and directly movs its memory onto the stack. consume() is called with a pointer to stack memory.

Example 3

  1. void long_string() {
  2. char arr[] = "Lorem ipsum dolor [...] est laborum.";
  3. consume(arr);
  4. }
  1. long_string():
  2. push rbx
  3. sub rsp, 448
  4. lea rsi, [rip + .L__const.long_string().arr]
  5. mov rbx, rsp
  6. mov edx, 446
  7. mov rdi, rbx
  8. call memcpy@PLT
  9. mov rdi, rbx
  10. call consume(char const*)@PLT
  11. add rsp, 448
  12. pop rbx
  13. ret
  14. .L__const.long_string().arr:
  15. .asciz "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."

Our string is now much too long to be treated as a number or two. The entire string will now be emitted into static memory, most likely .rodata after linking. It is still helpful for it to exist, because we can use memcpy to copy it from static memory onto the stack when initializing arr.


If you're worried about compilers doing something wasteful here, don't be. Modern compilers are very good at optimizing and deciding which symbols go where, and if they emit a string literal, this is usually because it must exist for some other code to work, or because it makes initialization of an array easier.

See live examples with Compiler Explorer

  • 本文由 发表于 2023年6月12日 20:01:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76456460-2.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
