将无符号字符的缓冲区并将其视为T*指针的违反严格别名规则吗?

huangapple go评论53阅读模式
英文:

Is having a buffer of unsigned char and treating it as a pointer to T* a violation of strict aliasing?

问题

以下是翻译好的部分:

有些资料称以下内容违反了严格别名规则:

#include <stdlib.h>

typedef struct Foo {
    int a;
    double* b;
} Foo;
int main() {

    _Alignas(Foo) unsigned char buffer[2048];
    Foo* a = (Foo*)&buffer[0];
    a->a = 44;
    a->b = NULL;
}

至少在GCC中不会引发错误:
https://godbolt.org/z/Tbzaodb8W

如果这被视为未定义行为,那么如何实现任何分配器,尤其是使用这样的unsigned char缓冲区的碰撞分配器呢?

我确实知道以下代码是允许的:

Foo MyFoo;
...
unsigned char* byteRepresentationOfFoo = (unsigned char*)&MyFoo;

因为unsigned char*允许别名任何类型,但对于unsigned char缓冲区的反向操作呢?

英文:

Some sources say the following is a violation of strict aliasing:

#include &lt;stdlib.h&gt;

typedef struct Foo {
    int a;
    double* b;
} Foo;
int main() {

    _Alignas(Foo) unsigned char buffer[2048];
    Foo* a = (Foo*)&amp;buffer[0];
    a-&gt;a = 44;
    a-&gt;b = NULL;
}

GCC atleast does not throw an error:
https://godbolt.org/z/Tbzaodb8W

if this undefined behaviour, how would any allocator be implemented, especially bump allocators that make use of such a unsigned char buffer?

I certainly know that

Foo MyFoo;
...
unsigned char* byteRepresentationOfFoo = (unsigned char*)&amp;MyFoo;

is allowed, since unsigned char* is allowed to alias any type, but what about the reverse with the unsigned char buffer?

答案1

得分: 4

是的,这是一种严格别名违规,未定义行为。声明的类型是unsigned char [2048],所以"有效类型"是相同的(C17 6.5 §6)。

a->a = 等等是一个 lvalue 表达式,使用与 unsigned char 不兼容的另一种有效类型来访问对象(C17 6.5 §7)。

作为一个附带说明,如果Foo是一个包含其成员之一的结构/联合字符类型数组的情况下,像这样做是可以的(C17 6.5 §7),但在这种情况下不行。

至少 GCC 不会报错

报告未定义行为不是gcc的工作。各种严格别名警告一直都很有问题,而且众所周知,gcc在警告非标准扩展方面也很宽松。clang也不会产生任何诊断信息。

相关链接:https://stackoverflow.com/questions/65842647/why-did-gcc-stop-warning-about-strict-aliasing-violation-from-version-7-2 答案可能是“因为存在错误”。近年来,他们推出了许多未经充分测试的编译器更改。gcc从2015年到2023年推出了8个主要版本,而在此期间实际的C语言只有1个次要修订版。


任何分配器如何实现

它不能,或者至少不能通过获取一个原始字符缓冲区并从中进行野指针转换。malloc等库函数可能是用非标准C或完全不同的语言实现的。

现在,大多数标准库实际上是用C编写的,比如glibc等。但是,如果你使用严格的C编译器而不是像gcc -fno-strict-aliasing这样的专门选项来编译它们,那就没有保证。请遵循库实现者的构建说明。

你可以通过类型玩弄联合来实现分配器:

typedef union
{
  Foo foo;
  unsigned char buf[n];
} pun_intended_t;

_Alignas(Foo) pun_intended_t pun = { .buf =  { something } };
Foo* f = (Foo*)pun.buf; // 只要对齐就是良好定义的

这利用了"严格别名"的另一个例外。


unsigned char* 允许与任何类型别名,但反过来呢?

相反,任何对象都可以使用字符类型来访问,但反之不行。这是因为有两个特殊规则:

C17 6.5 §7("严格别名规则"):

一个对象只能被具有以下类型之一的lvalue表达式访问:

  • 字符类型。

以及C17 6.3.2.3 §7:

当将对象的指针转换为字符类型的指针时,结果指向对象的最低地址字节。结果的连续递增,直到对象的大小,将指向对象的其余字节。

英文:

Yes it is a strict aliasing violation, undefined behavior. The declared type is unsigned char [2048] so the "effective type" is the same (C17 6.5 §6).

a-&gt;a = etc is a lvalue expression accessing the object using another effective type not compatible with unsigned char. (C17 6.5 §7)

As a side note it would have been ok to do stuff like this in case Foo was a struct/union containing a character type array among its members (C17 6.5 §7), but in this case it is not.

> GCC atleast does not throw an error

It's not gcc's job to report undefined behavior. The various strict aliasing warnings have always been pretty broken and gcc is also known to be lax when it comes to warning for non-standard extensions in general. clang doesn't throw any diagnostics either.

Related: https://stackoverflow.com/questions/65842647/why-did-gcc-stop-warning-about-strict-aliasing-violation-from-version-7-2 The answer is likely "because bugs". In recent times, they are rolling out so many poorly tested changes to the compiler. It took gcc 28 years to get from version 1 to version 5, but from there on it has been one major version release per year up to 13.x... They rolled out 8 major versions from 2015-2023 while there was only 1 minor revision to the actual C language in that time.


> how would any allocator be implemented

It can't, or at least not by taking a raw character buffer and wildly pointer cast from it. malloc and the like are library functions which may be implemented in non-standard C or another language entirely.

Now as it happens most standard libs are actually written in C, glibc etc. But if you compile those with a strict C compiler rather than with specialized options like gcc -fno-strict-aliasing, then there are no guarantees. Follow the build instructions of the library implementors.

You can implement allocators by type punning unions though:

typedef union
{
  Foo foo;
  unsigned char buf[n];
} pun_intended_t;

_Alignas(Foo) pun_intended_t pun = { .buf =  { something } };
Foo* f = (Foo*)pun.buf; // well-defined as long as aligned

This utilizes another exception from "strict aliasing".


> unsigned char* is allowed to alias any type, but what about the reverse with the unsigned char buffer?

Rather, any object may be accessed using a character type but not the other way around. This is because of two special rules:

C17 6.5 §7 ("the strict aliasing rule"):

> An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
> /--/
> - a character type.

As well as C17 6.3.2.3 §7

> When a pointer to
an object is converted to a pointer to a character type, the result points to the lowest addressed byte
of the object. Successive increments of the result, up to the size of the object, yield pointers to the
remaining bytes of the object.

答案2

得分: 3

严格别名规则在其他问题中有详细讨论,例如:https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule

因此,不存在“如果这是未定义行为”的情况。它确实是未定义行为,编译器不需要检测它(警告或引发错误)。回答结束。

然而,我们可以看看一些潜在的问题是什么。

最明显的问题是对齐。一个分配为

   char buffer[sizeof(struct T)];

的缓冲区可能没有对齐要求,而struct T很可能有对齐要求。在某些系统上,这意味着如果将buffer作为struct T访问,性能会受到影响。在某些系统上,这可能会导致无效的汇编指令,在执行时引发硬错误。

这可能很难捕捉,因为代码可能会运行得非常正常(因为buffer恰好被正确对齐),然后在与代码无关的更改之后,对齐发生变化,程序崩溃。可以通过在问题中使用_Alignas修饰符来避免这个问题。

另一个问题是代码优化。假设我们有

   _Alignas(struct T) char buffer[sizeof(struct T)] = {0};
   struct T * p = (struct T *)&amp;buffer[0];
   p-&gt;a = 1;
   transmit(buffer);

由于严格别名规则,编译器允许假定语句p-&gt;a = 1不会改变buffer。因此,为了这段代码片段,可以优化掉那个赋值,因为*p否则没有被使用。

这个问题可能很棘手,因为代码可能在多年内完美运行,直到某一天使用了新版本的编译器,其中实现了更激进的优化,然后代码突然停止工作(未定义行为的常见症状)。

因此,碰撞分配器需要具有与malloc等类似的接口,返回一个保证满足任何类型的对齐要求的void *指针[分配器可能被用于的任何类型]。然后,将该指针分配给特定类型的指针,之后只能使用该类型引用该内存区域。

英文:

Strict aliasing is well discussed in other questions, e.g.: https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule

Hence, there is no "if this is undefined behavior". It is and the compiler is not required to detect it (warn or raise an error). End of answer.

However, we can look at what some of the potential problems really are.

The most obvious is alignment. A buffer allocated as

   char buffer[sizeof(struct T)];

would probably have no alignment requirement while struct T may very well have alignment requirements. On some systems, this means that if buffer is accessed as a struct T, performance will be impacted. On some systems, it may result in an invalid assembler instruction giving a hard fault on execution.

This can be tricky to catch because code could be working perfectly fine (since buffer by coincidence is aligned correctly) and suddenly after an unrelated code change, the alignment changes and the program crashes. This problem can be avoided by using the _Alignas modifier as in the question.

Another problem is code optimization. Suppose that we have

   _Alignas(struct T) char buffer[sizeof(struct T)] = {0};
   struct T * p = (struct T *)&amp;buffer[0];
   p-&gt;a = 1;
   transmit(buffer);

Due to the strict aliasing rule, the compiler is allowed to assume that the statement p-&gt;a = 1 does not change buffer. Thus, for the sake of this code snippet, that assignment can be optimized away since *p is otherwise unused.

This issue can be tricky because the code may work perfectly for years until some day a new version of the compiler is used where more aggressive optimization has been implemented and suddenly the code stops working (a common symptom of undefined behavior).

So a bump allocator needs to have an interface just like e.g. malloc where a void * is returned which is guaranteed to fulfil the alignment requirement of any type [which the allocator may be used for]. This pointer is then assigned to a pointer of the specific type and afterwards that memory area is only referred to using that type.

答案3

得分: 0

您的代码将是符合规范的,尽管可能不是严格符合规范的。

设计和配置以支持低级编程构造的编译器将支持类似您的构造,而不考虑标准是否要求这样做。不适用于低级编程任务的编译器配置不应该用于执行此类任务。像您的代码这样的代码不适合与不适用于低级编程任务的编译器一起使用,并不意味着您的代码或编译器存在任何缺陷。相反,这是因为标准故意允许专为某些任务专门设计的实现以一种使它们不适合执行许多其他任务的方式处理代码的结果。

“严格别名规则”从未旨在将编程构造的宇宙分成所有实现都必须支持的那些和程序员永远不得使用的那些。相反,意图是避免禁止编译器编写者以对其主要用户/客户最大程度有用的方式执行别名优化,认识到想要销售编译器的人能够判断他们客户的需求,而委员会永远无法做到这一点。

标准没有努力列举他们认为明显的编译器应该支持的所有构造。除了使用lvalue访问其自身类型的对象的情况之外,标准列出的所有允许的构造都是那些--在没有这样的要求的情况下--一个真诚尝试编写好的编译器编写者可能会合理地决定不需要支持的东西。

英文:

Your code would be conforming, though probably not strictly conforming.

Compilers that are designed and configured to support low-level programming constructs will support constructs like yours without regard for whether the Standard requires them to do so. Compiler configurations that are not designed to be suitable for low-level programming tasks should not be used to perform such tasks. The fact that code such as yours is unsuitable for use with compilers that are unsuitable for low-level programming tasks does not imply any defect in your code, nor the compiler. Instead, it's a consequence of the fact that the Standard deliberately allows implementations specialized for some kinds of tasks to process code in ways that make them unsuitable for many other kinds of tasks.

The "strict aliasing rules" were never intended to partition the universe of programming constructs into those which all implementations must support and those which no programmers must ever use. Instead, the intention was to avoid forbidding compiler writers from performing aliasing optimizations in ways that would be maximally useful to their primary users/customers, recognizing that people wanting to sell compilers would be able to judge their customers' needs than the Committee ever could.

The Standard makes no effort to enumerate all of the constructs that they thought it obvious compilers should support. Aside from the scenario of an lvalue being used to access an object of its own type, all of the constructs that are listed by the Standard as being allowable are things for which--absent such requirement--someone sincerely trying to write a good compiler writer might plausibly decide support wasn't necessary.

huangapple
  • 本文由 发表于 2023年7月6日 13:45:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76625829.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定