在C++结构体数组中,使用指针算术操作遍历/切片相应的数据成员是否合法?

huangapple go评论85阅读模式
英文:

In a C++ array of structs, is it legal to iterate/slice over corresponding data members using pointer arithmetic?

问题

在C++标准中,对于类型为S的结构体数组,相邻对应数据成员之间的字节偏移始终是sizeof(S)。根据C++标准,直接通过将地址加上sizeof(S)来跳转到下一个对应的数据成员,就像在这个示例中一样,是合法的。

这可以使您能够在所有类型的结构体上使用相同的非模板化print_all()函数。

在GCC和MSVC中似乎可以工作,但根据C++标准,这是合法的。

英文:

In an array of structs of type S, the offset in bytes between neighboring corresponding data members is always sizeof(S). Is it legal, according to the C++ standard, to directly jump from one data member to the next corresponding one by adding sizeof(S) to the address, like in this example?

#include <vector>
#include <string>
#include <iostream>

using namespace std;

struct S
{
    vector<int> v;
    string s;
    int i;
    char c;
};

void print_all(const string* s, size_t n, ptrdiff_t stride_in_bytes)
{
    while (n > 0)
    {
        cout << *s << "\n";
        --n;
        // jump over any other data from string to string
        s = reinterpret_cast<const string*>(reinterpret_cast<uintptr_t>(s)+stride_in_bytes);
    }
}

int main(int, char**)
{
    S my_array[3];
    my_array[0].s = "a";
    my_array[1].s = "b";
    my_array[2].s = "c";
    print_all(&my_array[0].s, 3, sizeof(S));
    return 0;
}

This would enable using the same non-templated print_all() function for all kinds of structs.

It appears to work with GCC and MSVC, but is it legal according to the standard?

答案1

得分: 3

指针类型和整数类型(足够大的)之间的转换完全是实现定义的,除非将此类转换获得的整数值从指针值转换回原始指针类型,将重新产生原始指针值。

因此,您需要阅读编译器的文档,该文档_应该_记录了其行为。

然而,如果您使用了(unsigned) char*而不是uintptr_t,那么它将具有未定义行为。具体来说,从强制转换产生的指针仍将指向原始字符串对象或其对象表示。然后,指针算术要么根本不允许,因为指针类型T*中的T与实际指向的对象的类型(std::string)不_相似_,要么因为它将使指针(远)超出std::string对象的对象表示。您只能在数组内进行指针算术,包括最后一个元素之后,只有指针类型与实际数组的类型_相似_时才能进行,(对于此目的,不是数组元素的对象被视为一元素数组的元素)。任何其他指针算术本身都具有未定义行为。

至少从C++17开始,没有办法从指向s成员之一的指针到同一S对象的任何其他成员或同一数组中的任何其他S对象,而不依赖于特定于实现的行为。即使我认为没有任何编译器会这样做,编译器也可以在此假设下进行优化。

因此,不应该期望uintptr_t 变体也能工作。存在一个_指针来源_的概念,意味着指针值携带除其地址以外的信息。没有指定这应该如何与指针的整数表示互动,但请参见例如P2318以获取一些方法。编译器将在某种程度上使用这一点来进行优化。

英文:

Conversions between pointer types and integer types (of sufficient size) are completely implementation-defined, except that converting the integer value obtained by such a conversion from a pointer value back to the original pointer type will reproduce the original pointer value.

So, you'll need to read the documentation of your compiler which should document the behavior.

However, if you had used (unsigned) char* instead of uintptr_t, then it would have undefined behavior. Specifically the pointer resulting from the cast will still point to the original string object or to its object representation. Then pointer arithmetic is either not allowed at all because T in the pointer type T* is not similar to the actual pointed-to object's type (std::string), or because it would increment the pointer (far) beyond the object representation of the std::string object. You can only do pointer arithmetic inside an array, including one-past the last element, and only if the pointer type is similar to that of the actual array. (Objects which are not elements of an array are considered to be elements of a one-element array for this purpose.) Any other pointer arithmetic has in itself undefined behavior.

Since C++17 at least there is no way to get from a pointer to one of the s members to any other other members of the same S object or to any other S object in the same array without relying on implementation-specific behavior. The compiler would be allowed to optimize under this assumption (even if I don't think any do).

So it shouldn't be expected that the uintptr_t variant will work either. There is a concept of pointer provenance that implies pointer values to carry information beyond simply their address. It isn't specified how this should interact with integer representations of pointers, but see e.g. P2318 for some approaches. Compilers will use this to some degree for optimization purposes.

答案2

得分: 1

不管它是否合法(我不确定,但我怀疑不合法),我都不建议使用那种方法。一个更安全且明确合法的选择是使用成员指针,将其应用于数组中的每个struct实例,例如:

#include <vector>
#include <string>
#include <iostream>

using namespace std;

struct S
{
    vector<int> v;
    string s;
    string s2;
    int i;
    char c;
};

void print_all(const S* arr, size_t n, const string S::*member)
{
    while (n > 0)
    {
        cout << arr->*member << "\n";
        ++arr;
        --n;
    }
}

int main(int, char**)
{
    S my_array[3];

    my_array[0].s = "a";
    my_array[1].s = "b";
    my_array[2].s = "c";

    my_array[0].s2 = "d";
    my_array[1].s2 = "e";
    my_array[2].s2 = "f";

    print_all(my_array, 3, &S::s);
    print_all(my_array, 3, &S::s2);

    return 0;
}

在线演示

你可以使用模板泛化这个方法:

#include <vector>
#include <string>
#include <iostream>

using namespace std;

struct S
{
    vector<int> v;
    string s;
    string s2;
    int i;
    char c;
};

template<typename T, typename U>
void print_all(const T* arr, size_t n, const U T::*member)
{
    while (n > 0)
    {
        cout << arr->*member << "\n";
        ++arr;
        --n;
    }
}

template<typename T, typename U, size_t N>
void print_all(const T (&arr)[N], const U T::*member)
{
    print_all(arr, N, member);
}

int main(int, char**)
{
    S my_array[3];

    my_array[0].s = "a";
    my_array[1].s = "b";
    my_array[2].s = "c";

    my_array[0].s2 = "d";
    my_array[1].s2 = "e";
    my_array[2].s2 = "f";

    my_array[0].i = 1;
    my_array[1].i = 2;
    my_array[2].i = 3;
 
    my_array[0].c = 'g';
    my_array[1].c = 'h';
    my_array[2].c = 'i';

    print_all(my_array, &S::s);
    print_all(my_array, &S::s2);
    print_all(my_array, &S::c);
    print_all(my_array, &S::i);

    return 0;
}

在线演示

英文:

Whether or not it is legal (I don't know for sure, but I suspect not), I would not suggest using that approach at all. A much safer and definitely legal option would be to use a pointer-to-member instead, applying it to each struct instance in the array, eg:

#include &lt;vector&gt;
#include &lt;string&gt;
#include &lt;iostream&gt;

using namespace std;

struct S
{
    vector&lt;int&gt; v;
    string s;
    string s2;
    int i;
    char c;
};

void print_all(const S* arr, size_t n, const string S::*member)
{
    while (n &gt; 0)
    {
        cout &lt;&lt; arr-&gt;*member &lt;&lt; &quot;\n&quot;;
        ++arr;
        --n;
    }
}

int main(int, char**)
{
    S my_array[3];

    my_array[0].s = &quot;a&quot;;
    my_array[1].s = &quot;b&quot;;
    my_array[2].s = &quot;c&quot;;

    my_array[0].s2 = &quot;d&quot;;
    my_array[1].s2 = &quot;e&quot;;
    my_array[2].s2 = &quot;f&quot;;

    print_all(my_array, 3, &amp;S::s);
    print_all(my_array, 3, &amp;S::s2);

    return 0;
}

Online Demo

Which you can generalize with templates:

#include &lt;vector&gt;
#include &lt;string&gt;
#include &lt;iostream&gt;

using namespace std;

struct S
{
    vector&lt;int&gt; v;
    string s;
    string s2;
    int i;
    char c;
};

template&lt;typename T, typename U&gt;
void print_all(const T* arr, size_t n, const U T::*member)
{
    while (n &gt; 0)
    {
        cout &lt;&lt; arr-&gt;*member &lt;&lt; &quot;\n&quot;;
        ++arr;
        --n;
    }
}

template&lt;typename T, typename U, size_t N&gt;
void print_all(const T (&amp;arr)[N], const U T::*member)
{
    print_all(arr, N, member);
}

int main(int, char**)
{
    S my_array[3];

    my_array[0].s = &quot;a&quot;;
    my_array[1].s = &quot;b&quot;;
    my_array[2].s = &quot;c&quot;;

    my_array[0].s2 = &quot;d&quot;;
    my_array[1].s2 = &quot;e&quot;;
    my_array[2].s2 = &quot;f&quot;;

    my_array[0].i = 1;
    my_array[1].i = 2;
    my_array[2].i = 3;
 
    my_array[0].c = &#39;g&#39;;
    my_array[1].c = &#39;h&#39;;
    my_array[2].c = &#39;i&#39;;

    print_all(my_array, &amp;S::s);
    print_all(my_array, &amp;S::s2);
    print_all(my_array, &amp;S::c);
    print_all(my_array, &amp;S::i);

    return 0;
}

Online Demo

答案3

得分: 1

这是一个不错的想法和概念。然而,你为了复杂的计算而抛弃了类型安全。错误只能由程序员看到:3sizeof(S)在这里绝对不值得。 你可以循环应用对字段的访问。每次都是一个新的访问,但这引入了类型安全。

如果你解析结构体的字段偏移来实现与对齐无关的数据流,可能有一些原因。

而且,是的,这在法律上更或多多少少是合法的。

英文:

It a nice idea and concept. However you abandon type safety for tricky calculation. Errors are only seen by the programmer: 3 and sizeof(S). It is definitely not worth here. You can loop applying an access to a field. Every case a new access, but that introduces type safety.

It would have some reason, if y0u parsed field offsets of a struct to have a alignment independent streaming of data.

And yes it is more or less legal.

huangapple
  • 本文由 发表于 2023年4月7日 03:41:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/75953208.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定