
huangapple go评论53阅读模式

Is having a buffer of unsigned char and treating it as a pointer to T* a violation of strict aliasing?




#include <stdlib.h>

typedef struct Foo {
    int a;
    double* b;
} Foo;
int main() {

    _Alignas(Foo) unsigned char buffer[2048];
    Foo* a = (Foo*)&buffer[0];
    a->a = 44;
    a->b = NULL;


如果这被视为未定义行为,那么如何实现任何分配器,尤其是使用这样的unsigned char缓冲区的碰撞分配器呢?


Foo MyFoo;
unsigned char* byteRepresentationOfFoo = (unsigned char*)&MyFoo;

因为unsigned char*允许别名任何类型,但对于unsigned char缓冲区的反向操作呢?


Some sources say the following is a violation of strict aliasing:

#include &lt;stdlib.h&gt;

typedef struct Foo {
    int a;
    double* b;
} Foo;
int main() {

    _Alignas(Foo) unsigned char buffer[2048];
    Foo* a = (Foo*)&amp;buffer[0];
    a-&gt;a = 44;
    a-&gt;b = NULL;

GCC atleast does not throw an error:

if this undefined behaviour, how would any allocator be implemented, especially bump allocators that make use of such a unsigned char buffer?

I certainly know that

Foo MyFoo;
unsigned char* byteRepresentationOfFoo = (unsigned char*)&amp;MyFoo;

is allowed, since unsigned char* is allowed to alias any type, but what about the reverse with the unsigned char buffer?


得分: 4

是的,这是一种严格别名违规,未定义行为。声明的类型是unsigned char [2048],所以"有效类型"是相同的(C17 6.5 §6)。

a->a = 等等是一个 lvalue 表达式,使用与 unsigned char 不兼容的另一种有效类型来访问对象(C17 6.5 §7)。

作为一个附带说明,如果Foo是一个包含其成员之一的结构/联合字符类型数组的情况下,像这样做是可以的(C17 6.5 §7),但在这种情况下不行。

至少 GCC 不会报错


相关链接:https://stackoverflow.com/questions/65842647/why-did-gcc-stop-warning-about-strict-aliasing-violation-from-version-7-2 答案可能是“因为存在错误”。近年来,他们推出了许多未经充分测试的编译器更改。gcc从2015年到2023年推出了8个主要版本,而在此期间实际的C语言只有1个次要修订版。



现在,大多数标准库实际上是用C编写的,比如glibc等。但是,如果你使用严格的C编译器而不是像gcc -fno-strict-aliasing这样的专门选项来编译它们,那就没有保证。请遵循库实现者的构建说明。


typedef union
  Foo foo;
  unsigned char buf[n];
} pun_intended_t;

_Alignas(Foo) pun_intended_t pun = { .buf =  { something } };
Foo* f = (Foo*)pun.buf; // 只要对齐就是良好定义的


unsigned char* 允许与任何类型别名,但反过来呢?


C17 6.5 §7("严格别名规则"):


  • 字符类型。

以及C17 §7:



Yes it is a strict aliasing violation, undefined behavior. The declared type is unsigned char [2048] so the "effective type" is the same (C17 6.5 §6).

a-&gt;a = etc is a lvalue expression accessing the object using another effective type not compatible with unsigned char. (C17 6.5 §7)

As a side note it would have been ok to do stuff like this in case Foo was a struct/union containing a character type array among its members (C17 6.5 §7), but in this case it is not.

> GCC atleast does not throw an error

It's not gcc's job to report undefined behavior. The various strict aliasing warnings have always been pretty broken and gcc is also known to be lax when it comes to warning for non-standard extensions in general. clang doesn't throw any diagnostics either.

Related: https://stackoverflow.com/questions/65842647/why-did-gcc-stop-warning-about-strict-aliasing-violation-from-version-7-2 The answer is likely "because bugs". In recent times, they are rolling out so many poorly tested changes to the compiler. It took gcc 28 years to get from version 1 to version 5, but from there on it has been one major version release per year up to 13.x... They rolled out 8 major versions from 2015-2023 while there was only 1 minor revision to the actual C language in that time.

> how would any allocator be implemented

It can't, or at least not by taking a raw character buffer and wildly pointer cast from it. malloc and the like are library functions which may be implemented in non-standard C or another language entirely.

Now as it happens most standard libs are actually written in C, glibc etc. But if you compile those with a strict C compiler rather than with specialized options like gcc -fno-strict-aliasing, then there are no guarantees. Follow the build instructions of the library implementors.

You can implement allocators by type punning unions though:

typedef union
  Foo foo;
  unsigned char buf[n];
} pun_intended_t;

_Alignas(Foo) pun_intended_t pun = { .buf =  { something } };
Foo* f = (Foo*)pun.buf; // well-defined as long as aligned

This utilizes another exception from "strict aliasing".

> unsigned char* is allowed to alias any type, but what about the reverse with the unsigned char buffer?

Rather, any object may be accessed using a character type but not the other way around. This is because of two special rules:

C17 6.5 §7 ("the strict aliasing rule"):

> An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
> /--/
> - a character type.

As well as C17 §7

> When a pointer to
an object is converted to a pointer to a character type, the result points to the lowest addressed byte
of the object. Successive increments of the result, up to the size of the object, yield pointers to the
remaining bytes of the object.


得分: 3





   char buffer[sizeof(struct T)];

的缓冲区可能没有对齐要求,而struct T很可能有对齐要求。在某些系统上,这意味着如果将buffer作为struct T访问,性能会受到影响。在某些系统上,这可能会导致无效的汇编指令,在执行时引发硬错误。



   _Alignas(struct T) char buffer[sizeof(struct T)] = {0};
   struct T * p = (struct T *)&amp;buffer[0];
   p-&gt;a = 1;

由于严格别名规则,编译器允许假定语句p-&gt;a = 1不会改变buffer。因此,为了这段代码片段,可以优化掉那个赋值,因为*p否则没有被使用。


因此,碰撞分配器需要具有与malloc等类似的接口,返回一个保证满足任何类型的对齐要求的void *指针[分配器可能被用于的任何类型]。然后,将该指针分配给特定类型的指针,之后只能使用该类型引用该内存区域。


Strict aliasing is well discussed in other questions, e.g.: https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule

Hence, there is no "if this is undefined behavior". It is and the compiler is not required to detect it (warn or raise an error). End of answer.

However, we can look at what some of the potential problems really are.

The most obvious is alignment. A buffer allocated as

   char buffer[sizeof(struct T)];

would probably have no alignment requirement while struct T may very well have alignment requirements. On some systems, this means that if buffer is accessed as a struct T, performance will be impacted. On some systems, it may result in an invalid assembler instruction giving a hard fault on execution.

This can be tricky to catch because code could be working perfectly fine (since buffer by coincidence is aligned correctly) and suddenly after an unrelated code change, the alignment changes and the program crashes. This problem can be avoided by using the _Alignas modifier as in the question.

Another problem is code optimization. Suppose that we have

   _Alignas(struct T) char buffer[sizeof(struct T)] = {0};
   struct T * p = (struct T *)&amp;buffer[0];
   p-&gt;a = 1;

Due to the strict aliasing rule, the compiler is allowed to assume that the statement p-&gt;a = 1 does not change buffer. Thus, for the sake of this code snippet, that assignment can be optimized away since *p is otherwise unused.

This issue can be tricky because the code may work perfectly for years until some day a new version of the compiler is used where more aggressive optimization has been implemented and suddenly the code stops working (a common symptom of undefined behavior).

So a bump allocator needs to have an interface just like e.g. malloc where a void * is returned which is guaranteed to fulfil the alignment requirement of any type [which the allocator may be used for]. This pointer is then assigned to a pointer of the specific type and afterwards that memory area is only referred to using that type.


得分: 0






Your code would be conforming, though probably not strictly conforming.

Compilers that are designed and configured to support low-level programming constructs will support constructs like yours without regard for whether the Standard requires them to do so. Compiler configurations that are not designed to be suitable for low-level programming tasks should not be used to perform such tasks. The fact that code such as yours is unsuitable for use with compilers that are unsuitable for low-level programming tasks does not imply any defect in your code, nor the compiler. Instead, it's a consequence of the fact that the Standard deliberately allows implementations specialized for some kinds of tasks to process code in ways that make them unsuitable for many other kinds of tasks.

The "strict aliasing rules" were never intended to partition the universe of programming constructs into those which all implementations must support and those which no programmers must ever use. Instead, the intention was to avoid forbidding compiler writers from performing aliasing optimizations in ways that would be maximally useful to their primary users/customers, recognizing that people wanting to sell compilers would be able to judge their customers' needs than the Committee ever could.

The Standard makes no effort to enumerate all of the constructs that they thought it obvious compilers should support. Aside from the scenario of an lvalue being used to access an object of its own type, all of the constructs that are listed by the Standard as being allowable are things for which--absent such requirement--someone sincerely trying to write a good compiler writer might plausibly decide support wasn't necessary.

  • 本文由 发表于 2023年7月6日 13:45:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76625829.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
