英文:
What are use cases for writing (&var + 1) if var is not an array element?
问题
Writing (&var + 1)
allows for pointer arithmetic beyond the end of an object, which is permissible in C due to the specified behavior in the C standard. Use cases include efficient memory allocation and managing heterogeneous data structures where variable-sized objects need to be stored sequentially in memory.
英文:
Recently I learned from user "chux" that it is legal to add 1
to an address that doesn't represent an array element. Specifically, the following provision in the standard (C17 draft, 6.5.6 ¶7)
> For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
makes it legal to write &var + 1
where var
is not representable as arr[i]
for some T arr[n]
where 0
≤i
<n
.
What are use cases for doing this? I found an example by Aaron Ballman (on the SEI CERT C Coding Standard website) who mentions "allocation locality". Without quoting his entire example, the essence seems to be that one can allocate space for multiple objects using a single call to malloc
, so that one can assign to them like this:
T1 *objptr1 = (T1 *)malloc(sizeof(T1) + sizeof(*objptr2));
*objptr1 = ...;
memcpy(objptr1 + 1, objptr2, sizeof(*objptr2))
Here is a toy example of mine:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
float obj2 = 432.1;
long *objptr1 = (long *)malloc(sizeof(*objptr1) + sizeof(obj2));
*objptr1 = 123456789L;
memcpy(objptr1 + 1, &obj2, sizeof(obj2));
printf("%ld\n", *objptr1); // 123456789
printf("%f\n", *(float *)(objptr1 + 1)); // 432.100006
return 0;
}
I hope that this captures the essence of the idiom. (Perhaps it does not: As a commenter pointed out, my toy example assumes that the alignment of float
is smaller than or equal to the alignment of long
. The original example by Aaron Ballman had a string as the second object, and strings can be arbitrarily aligned. For a correct minimal (toy) version of Aaron Ballman's code stub see my own answer here.)
However, it seems that one could also simply use a (char *)
-cast with sizeof
instead:
memcpy((char *)objptr1 + sizeof(*objptr1), &obj2, sizeof(obj2));
In the general case, &var + 1
is shorter than (char *)&var + sizeof var
, so perhaps this is the advantage.
But is that all? What are use cases for writing (&var + 1) if var is not an array element?
答案1
得分: 4
> 如果 var 不是数组元素,写成 (&var + 1) 的用例是什么?
并非所有脱离语言语义的东西都有具体的用途。大多数计算机语言都设计用于一致性和充分性。有些也追求简单性。然而,很少有语言明确追求极简性,而C不是其中之一。
指针算术为标量指针定义的主要原因是它使指针算术更容易定义。标量指针并非特例,这很好,因为无法必然地将它们与指向数组元素的指针区分开来(或者说:实现无需使这成为可能)。此外,将标量指针等效于单元素数组的指针不会引发问题,因为指针类型相同,标量的表示与相同数据类型的单元素数组的表示相同。
鉴于指针算术是依赖于标量和单元素数组之间的语义等价性来定义的,在希望依赖该语义等价性的情况下,&scalar + 1
的用例与 &single_element_array[0] + 1
的用例完全相同。反过来,这些情况基本上与一般情况下的 &n_element_array[n-1] + 1
用例相同。
也许更好的问题是,为什么语言允许计算数组末尾刚过的指针,以及那可能有什么用途。据我所知,或者说我曾经能够确定的情况,这主要是为了方便起见。例如,如果允许计算指向数组末尾刚过的指针(但不允许解引用),则通过指针迭代数组更容易。并且通过包含起始和不包含结束的指针对来表示子数组是可取的。然而,这些都不是必不可少的事情。
英文:
> What are use cases for writing (&var + 1) if var is not an array element?
Not everything that falls out of the language semantics has a specific use. Most computer languages are designed for consistency and sufficiency. Some also aim for simplicity. Few, however, expressly target minimality, and C is not one of them.
The primary reason that pointer arithmetic is defined for pointers to scalars is that it makes it easier to define pointer arithmetic. Pointers to scalars are not a special case, which is good, because it's not necessarily possible to distinguish them from pointers to array elements (alternatively: implementations don't need to make that possible). Furthermore, making pointers to scalars equivalent to pointers to the single element of a one-element array is unproblematic, because the pointer types are the same and the representation of a scalar is identical to the representation of a one-element array of the same data type.
Given that pointer arithmetic is defined for pointers to scalars by relying on a semantic equivalence between scalars and single-element arrays, the use cases for &scalar + 1
are exactly the same as those for &single_element_array[0] + 1
, in contexts where one wants to lean on that semantic equivalence. In turn, those cases are pretty much the same as the ones for &n_element_array[n-1] + 1
generally.
Perhaps a better question, then, would be why the language allows computing a pointer to just past the end of an array, and what use that might have. As far as I am aware or have ever been able to determine, those are primarily a matter of convenience. For example, it is easier to iterate over an array via pointers if you are permitted to compute (but not dereference) a pointer to just past the end of the array. And it is desirable to be able to express sub-arrays via an [inclusive_start, exclusive_end) pointer pair. Neither of those things is essential, however.
答案2
得分: 3
以下是翻译好的部分:
如果您有一个'真实'数组,您可以这样写:
enum { N = 10 };
int arr[N];
...设置arr中的值...
int *end = arr + N;
for (int *cur = arr; cur < end; cur++)
{
...使用*cur...
}
您可以使用单个变量完成相同的操作:
int var;
int *end = &var + 1;
for (int *cur = &var; cur < end; cur++)
{
...使用*cur...
}
您可能会将循环隐藏在一个函数中,可能是一个函数,该函数接受数组的开头和数组末尾之后的位置:
some_func(&arr[0], &arr[N]);
some_func(&var, &var + 1);
相同的代码可以用于普通变量和正常数组。
您还可以将数组的开头和长度传递给函数,函数可以进行算术运算:
another_func(arr, N);
another_func(&var, 1);
其中:
void another_func(int *base, size_t size)
{
for (int *end = base + size; base < end; base++)
...处理*base...也就是base[0]...
}
所有使用var
的代码都依赖于能够创建地址&var + 1
,尽管它们都不访问该地址上的数据。
英文:
If you have a 'real' array, you might write:
enum { N = 10 };
int arr[N];
…set the values in arr…
int *end = arr + N;
for (int *cur = arr; cur < end; cur++)
{
…use *cur…
}
You can do the same with a single variable:
int var;
int *end = &var + 1;
for (int *cur = &var; cur < end; cur++)
{
…use *cur…
}
You would probably have the loop hidden in a function, possibly a function that is passed the start of the array and one beyond the end of an array:
some_func(&arr[0], &arr[N]);
some_func(&var, &var + 1);
The same code can be used for both the ordinary variable and the normal array.
You could also pass the function the start of the array and the length, and the function could do the arithmetic:
another_func(arr, N);
another_func(&var, 1);
with:
void another_func(int *base, size_t size)
{
for (int *end = base + size; base < end; base++)
…process *base…aka base[0]…
}
All the code using var
depends on being able to create the address &var + 1
though none of it accesses the data at that address.
答案3
得分: 1
以下是翻译好的部分:
原因是允许您使完全的指针算术也适用于不是数组的个别变量,以便在需要数组的地方可以原地使用。
例如,假设我们想要从标准输入中以字符为单位进行read()
,但是每个字符发出一个单独的read()
。 Read()
需要传递一个char数组…但您不会定义一个只包含一个char的数组以便能够将其与read一起使用。在这种情况下:
/*缩进用于指示局部,自动作用域*/
char the_char;
int res = read(0, &the_char, 1);
将允许read()
在内部移动指针到数组的末尾,而不知道您实际上传递了一个单个的char
变量。如果标准没有明确说明,您应该写:
char the_char[1];
int res = read(0, the_char, 1);
但随后您应该在所有地方写the_char[0]
来引用读取的字符,而不仅仅是the_char
(降低了代码的可读性)
在内部,read()
可以将缓冲区指针处理为指针并基于指针位置创建循环:
for (char *p = buffer, * const end = buffer + len;
p < end;
p++)
{
/*对*p应用的某些内容*/
}
或者
for (int i = 0; i < len; i++) {
/*对每个buffer[i]应用的某些内容*/
}
在第一种情况下,它应用于由移动指针引用的指定字符。在第二种情况下,使用辅助变量i
按顺序访问数组元素。
通常,第一版本在编写时更为优化,因为指针在每次迭代时都在移动,只需对指针进行解引用即可访问元素。在第二种情况下,创建了一个辅助变量(以提高可读性),但是数据的访问必须计算为相对于数组起始位置的变量偏移,并在每次迭代中解决。在运行编译器优化器后,通常两个版本都会缩减为相同的汇编代码,因此您通常使用哪个版本没有什么意义。
英文:
The reason for that is to allow you to make full pointer arithmetic valid also for individual variables which are not arrays, to be usable in-place where arrays are required.
For example, let's say that we want to read()
bytes from stdin, but issuing an individual read()
per character. Read()
requires an array of char to be passed to it... but you are not going to define an array of just one char to be able to use it with read. In that case:
/* indentation used to indicate local, automatic scope */
char the_char;
int res = read(0, &the_char, 1);
will allow read()
internally to move the pointer to the end of the array without knowing that you have actually passed a single char
variable. If that was not explicitly said in the standard, you should have written:
char the_char[1];
int res = read(0, the_char, 1);
but then, later you should write everywhere the_char[0]
to refer to the read character, instead of just the_char
(degrading the readability of your code)
Internally, read()
can handle the buffer pointer as a pointer and creata a loop based on the pointer positions:
for (char *p = buffer, * const end = buffer + len;
p < end;
p++)
{
/* something applying to *p */
}
or
for (int i = 0; i < len; i++) {
/* something applying to each buffer[i] */
}
In the first case, it applies to the pointed character that is referenced by the moving pointer. In the second, an auxiliary variable i
is used to access the array elements sequentially.
Normally, the first version is more optimal as it is written, as the pointer is being moved on each iteration and the access to the element is made by just dereferencing the pointer. In the second case, an auxiliary variable is created (for better readability) but the access to the data has to be calculated as a variable offset repect to the array beginning and solved at each iterations. After the compiler optimizer is run, both versions normally reduce to the same assembler code, so which version you use normally means nothing.
答案4
得分: 0
在C99引入灵活数组成员之前,模拟具有不定长度字符串的结构的一种方法是在结构内部使用指向直接位于结构后的字符串的指针。
Aaron Ballman的代码存根的正确简化(玩具)版本示例了对不是数组元素的指针进行递增的用法:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
struct rec {
int a; /* dummy member */
char *varstr;
int b; /* dummy member */
};
struct rec *create_rec(const char *s) {
struct rec *r;
size_t len = strlen(s) + 1;
r = malloc(sizeof(*r) + len); /* 隐式转换从 void * 到 struct rec * 是可以的 */
r->varstr = (char*)(r + 1); /* 从 struct rec * 到 char * 的强制转换是可以的 */
memcpy(r->varstr, s, len);
return r;
}
int main(void) {
struct rec *my_r;
my_r = create_rec("this is a test");
my_r->a = 9;
my_r->b = 321;
printf("%d\n", my_r->a); /* 9 */
puts(my_r->varstr); /* this is a test */
printf("%d\n", my_r->b); /* 321 */
return 0;
}
这个语句 r->varstr = (char*)(r + 1);
示范了将 struct rec *my_r
视为 struct rec my_r[1]
。当然,你也可以写成 r->varstr = (char*)r + sizeof(*r);
,这不依赖于这个技巧,同样有效。
英文:
Before flexible array members were introduced in C99, one way of emulating a struct with strings of indeterminate length inside would be to use a pointer within the struct to a string allocated to be directly after the struct.
A correct minimal (toy) version of Aaron Ballman's code stub illustrates the use of incrementing a pointer to something that is not an array element:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
struct rec {
int a; /* dummy member */
char *varstr;
int b; /* dummy member */
};
struct rec *create_rec(const char *s) {
struct rec *r;
size_t len = strlen(s) + 1;
r = malloc(sizeof(*r) + len); /* implicit conversion from void * to struct rec * is okay */
r->varstr = (char*)(r + 1); /* casting from struct rec * to char * is okay */
memcpy(r->varstr, s, len);
return r;
}
int main(void) {
struct rec *my_r;
my_r = create_rec("this is a test");
my_r->a = 9;
my_r->b = 321;
printf("%d\n", my_r->a); /* 9 */
puts(my_r->varstr); /* this is a test */
printf("%d\n", my_r->b); /* 321 */
return 0;
}
The statement r->varstr = (char*)(r + 1);
illustrating this treats struct rec *my_r
as struct rec my_r[1]
. (Of course, one could instead write r->varstr = (char*)r + sizeof(*r);
, which doesn't rely on this trick and works equally well.)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论