将指向数组的指针作为指向指针的指针传递在C中是未定义行为吗?

huangapple go评论70阅读模式
英文:

Is passing pointer to an array as pointer to pointer UB in C?

问题

这是您提供的代码:

#include <stdlib.h>
#include <stdio.h>

void func(int **b)
{
    printf("b = %p\n", b); // 0x7ffe76932330
    *b = *b + 1;
}

int main(void)
{
    int b[10] = {0};

    printf("b = %p\n", &b[0]); // 0x7ffe76932330
    printf("%d\n", b[0]);      // 0

    func(&b);

    printf("%d\n", b[0]); // 4
    return 0;
}

这段代码似乎存在未定义行为(UB),至少因为不进行显式转换而涉及不同类型,即int (*)[10]int **

如果您有char b[] = "some string";,行为几乎相同,也会导致潜在的未定义行为。

英文:

I have such a code:

#include &lt;stdlib.h&gt;
#include &lt;stdio.h&gt;

void func(int **b)
{
	printf(&quot;b = %p\n&quot;, b); // 0x7ffe76932330
	*b = *b + 1;
}

int main(void)
{
	int b[10] = {0};

	printf(&quot;b = %p\n&quot;, &amp;b[0]); // 0x7ffe76932330
	printf(&quot;%d\n&quot;, b[0]);      // 0

	func(&amp;b);

	printf(&quot;%d\n&quot;, b[0]); // 4
	return 0;
}

Does this code have UB? To me it seems so, at least due to different types without explicit casting int (*)[10] != int **.

Also, what if I have char b[] = &quot;some string&quot;; instead? The behavior is almost the same... weird.

答案1

得分: 5

将指针本身传递给函数并不一定会导致未定义行为,但随后使用转换后的指针会导致未定义行为。

C允许从一个对象类型转换为另一个对象类型,如C标准的6.2.3.2p7节所述:

可以将指向对象类型的指针转换为指向不同对象类型的指针。如果结果指针未正确对齐于所引用的类型,则行为未定义。否则,在再次转换回来时,结果应与原始指针相等。当将指向对象的指针转换为指向字符类型的指针时,结果将指向对象的最低地址字节。对结果的连续增量,直到对象的大小,将产生指向对象的其余字节的指针。

所以假设没有对齐问题(即在64位系统上,数组以8字节偏移开始),只是将 int (*)[10] 传递给期望 int ** 的函数是允许的,尽管大多数编译器会警告不兼容指针类型的转换。

未定义行为发生在这里:

*b = *b + 1;

因为您通过不兼容的指针类型(而不是 char *)对对象进行了解引用。关于您允许解引用的规则在第6.5p7节中列出:

对象的存储值只能通过以下类型之一的lvalue表达式访问:

  • 与对象的有效类型兼容的类型,
  • 与对象的有效类型兼容的类型的限定版本,
  • 与对象的有效类型相对应的带符号或无符号类型,
  • 与对象的有效类型的限定版本相对应的带符号或无符号类型,
  • 包含前述类型之一的聚合或联合类型的成员(包括递归地是子聚合或包含联合的成员),或
  • 字符类型。

int (*)[10] 作为 int ** 进行解引用不符合上述任何标准,因此 *b 是未定义行为。

英文:

Passing the pointer by itself isn't necessarily undefined behavior, but subsequently using the converted pointer is.

C allows conversions from one object type to another and back, as documented in section 6.2.3.2p7 of the C standard:

> A pointer to an object type may be converted to a pointer to a
> different object type. If the resulting pointer is not correctly
> aligned for the referenced type, the behavior is undefined.
> Otherwise, when converted back again, the result shall compare equal
> to the original pointer. When a pointer to an object is converted to a
> pointer to a character type, the result points to the lowest addressed
> byte of the object. Successive increments of the result, up to the
> size of the object, yield pointers to the remaining bytes of the
> object.

So assuming there's no alignment issue (i.e. the array starts on an 8 byte offset on a 64 bit system), just the action of passing a int (*)[10] to a function expecting an int ** is allowed, although most compilers will warn about converting incompatible pointer types.

The undefined behavior happens here:

*b = *b + 1;

Because you're derferencing an object though an incompatible pointer type (other than a char *). The rules regarding what you're allowed to dereferences are listed in section 6.5p7:

> An object shall have its stored value accessed only by an lvalue
> expression that has one of the following types:
> - a type compatible with the effective type of the object,
> - a qualified version of a type compatible with the effective type of the object,
> - a type that is the signed or unsigned type corresponding to the effective type of the object,
> - a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
> - an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a
> subaggregate or contained union), or
> - a character type.

Dereferencing a int (*)[10] as a int ** doesn't meet any of the above criteria, so *b is undefined behavior.

答案2

得分: 2

数组不是指针,因此将func传递给func(&amp;b)的参数不是指向指针的指针。这是指向数组的指针,这是一种不寻常的类型,当将数组的数组传递给函数时会产生(int b[10][10]定义了一个int数组的数组)。

&amp;b传递给func涉及指针类型之间的转换,这是C标准允许的,但程序员应该小心:如果配置正确,编译器会发出警告:推荐使用gccclang-Wall -Werror选项。

至于未定义行为本身:您将&amp;b传递给func,期望得到一个int **。编译器执行从&amp;b的类型(即int(*)[10])到类型int **的转换,这可能具有不同的对齐要求。确实,bint的宽度(通常为4字节)对齐,而int *可能需要8字节的对齐,这是大多数64位系统的情况。

C23标准将此转换描述为具有未定义行为:

> 6.3.2.3 指针
>
> 可以将指向对象类型的指针转换为指向不同对象类型的指针。如果所得指针的对齐方式不正确,则行为未定义。

因此,标准将此转换描述为具有未定义行为

如果int *int具有相同的对齐要求,例如在32位系统上,当将&amp;b传递给func时,您不会得到未定义的行为,但在评估表达式*b = *b + 1;时会出现未定义的行为,因为:

> 6.5 表达式
>
> 只能通过以下类型之一的左值表达式来访问对象的存储值:<br>
> - 与对象的有效类型兼容的类型,
> - 对象的有效类型的修饰版本,
> - 与对象的有效类型相对应的有符号或无符号类型,
> - 与对象的有效类型的修饰版本相对应的有符号或无符号类型,
> - 包括前述类型之一在其成员中(包括子成员或包含的联合体的成员,递归地)的聚合或联合类型,
> - 字符类型。

因此,在*b = *b + 1中对b进行解引用具有未定义行为。为了说明这一点,您可以尝试调用func(&amp;(b+1))来查看未定义行为是否更明显(程序可能会因总线错误而退出)。

还请注意,printf期望%p的参数是void *,因此必须将b&amp;b[0]强制转换为(void *)以避免进一步的两个未定义行为实例。

英文:

An array is not a pointer, thus a pointer to an array as you pass to func with func(&amp;b) is not a pointer to a pointer. It is a pointer to an array, an unusual type that is produced when passing an array of arrays to a function (int b[10][10] defines an array of arrays of int).

Passing &amp;b to func involves a conversion between pointer types, Something the C Standard allows, but programmers should be careful about: the compiler would issue a warning if configured properly: -Wall -Werror is recommended for gcc and clang.

Regarding the undefined behavior itself: you pass &amp;b to func expecting an int **. The compiler performs a conversion from the type of &amp;b, which is int(*)[10] to the type int ** which might have a different alignment requirement. Indeed b is aligned on the width of int (usually 4 bytes), whereas int * may require an alignment of 8 bytes, as is the case on most 64-bit systems.

The C23 standard specifies this conversion as having undefined behavior:

> 6.3.2.3 Pointers:
>
> A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.

So the Standard describes the very conversion as having undefined behavior.

If int * and int have the same alignment requirements, which is the case for example on 32-bit systems, you don't get undefined behavior when passing &amp;b to func, but you do when evaluating the expression *b = *b + 1; because:

> 6.5 Expressions
>
> An object shall have its stored value accessed only by an lvalue expression that has one of the following types:<br>
> - a type compatible with the effective type of the object,
> - a qualified version of a type compatible with the effective type of the object,
> - a type that is the signed or unsigned type corresponding to the effective type of the object,
> - a type that is the signed or unsigned type corresponding to a qualified version of the effective
type of the object,
> - an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
> - a character type.

Hence dereferencing b in *b = *b + 1 has undefined behavior. For illustration, you can try calling func(&amp;(b+1)) to check if the undefined behavior is more visible (the program might exit with a bus error).

Also note that printf expects a void * for %p, so b and &amp;b[0] must be cast as (void *) to avoid 2 more instances of undefined behavior.

答案3

得分: 1

表达式 &amp;b 的类型是 int (* )[10]。从类型为 int (* )[10] 的指针到类型为 int ** 的指针之间没有隐式转换。因此,编译器应该对此语句发出一条消息:

func(&amp;b);

但即使你将参数表达式强制转换为:

func((int **)&amp;b);

尽管如此,解引用获得的指针表达式可能会引发未定义行为。也就是说,在函数调用中使用的表达式 &amp;b 具有与数组的第一个元素的地址值相同的地址值。

因此,在函数内部,表达式 *b 会产生传递的数组的第一个元素的值(例如,如果 sizeof( int * ) 等于 sizeof( int ),例如,两者都等于 4)或传递的数组的两个第一个元素的组合值(如果 sizeof( int * ) 等于 2 * sizeof( int ),例如,指针的大小等于 8,整数的大小等于 4)。

也就是说,表达式 *b 不会包含有效的地址。

因此,这个语句:

*b = *b + 1;

没有意义。在提供的示例中,由于初始数组是零初始化的,表达式 *b 可能会产生一个空指针。你可以在函数内部以以下方式测试该表达式:

printf("*b == NULL is %s\n", *b == NULL ? "true" : "false");

如果你以相同的方式使用字符数组:

char b[] = "some string";

问题也会发生。

相反,你可以例如这样写:

int b[10] = {0};

int *pb = b;

func(&pb);

在这种情况下,函数内部的表达式 *b 将指向传递的数组的第一个元素,而这个语句:

*b = *b + 1;

将增加获得的指针,现在指向数组的第二个元素。

英文:

The type of the expression &amp;b is int ( * )[10]. There is no implicit conversion from a pointer of the type int ( * )[10] to a pointer of the type int **. So the compiler should issue a message for this statement

func(&amp;b);

But even if you will cast the argument expression like:

func( (int ** )&amp;b);

Nevertheless dereferencing the obtained pointer expression can invoke undefined behavior. That is the expression &amp;b used in the function call has the same address value as the address value of the first element of the array.

So within the function the expression *b yields the value of the first element (if sizeof( int * ) is equal to sizeof( int ) when for example the both are equal to 4) of the passed array or a combined value of two first elements of the passed array (if sizeof( int * ) is equal to 2 * sizeof( int ) when for example the size of pointer is equal to 8 and the size of integer equal to 4).

That is the expression *b will not contain a valid address.

Thus this statement:

*b = *b + 1;

does not make sense. In the provided example as the initial array is zero initialized the expression *b can produce a null pointer. You could test the expression within the function for example the following way

printf( &quot;*b == NULL is %s\n&quot;, *b == NULL ? &quot;true&quot; : &quot;false&quot; );

The same problem will take place with the character array:

char b[] = &quot;some string&quot;;

if you will use it the same way.

Instead you could write for example:

int b[10] = {0};

int *pb = b;

func( &amp;pb );

In this case the expression *b within the function will point to the first element of the passed array and this statement:

*b = *b + 1;

will increment the obtained pointer that will now point to the second element of the array.

huangapple
  • 本文由 发表于 2023年6月30日 02:04:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76583577.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定