2023年4月7日 03:41:42go评论112阅读模式

英文:

In a C++ array of structs, is it legal to iterate/slice over corresponding data members using pointer arithmetic?

问题

在C++标准中，对于类型为S的结构体数组，相邻对应数据成员之间的字节偏移始终是sizeof(S)。根据C++标准，直接通过将地址加上sizeof(S)来跳转到下一个对应的数据成员，就像在这个示例中一样，是合法的。

这可以使您能够在所有类型的结构体上使用相同的非模板化print_all()函数。

在GCC和MSVC中似乎可以工作，但根据C++标准，这是合法的。

英文:

In an array of structs of type S, the offset in bytes between neighboring corresponding data members is always sizeof(S). Is it legal, according to the C++ standard, to directly jump from one data member to the next corresponding one by adding sizeof(S) to the address, like in this example?

#include &lt;vector&gt;
#include &lt;string&gt;
#include &lt;iostream&gt;
using namespace std;
struct S
{
    vector&lt;int&gt; v;
    string s;
    int i;
    char c;
};
void print_all(const string* s, size_t n, ptrdiff_t stride_in_bytes)
{
    while (n &gt; 0)
    {
        cout &lt;&lt; *s &lt;&lt; &quot;\n&quot;;
        --n;
        // jump over any other data from string to string
        s = reinterpret_cast&lt;const string*&gt;(reinterpret_cast&lt;uintptr_t&gt;(s)+stride_in_bytes);
    }
}
int main(int, char**)
{
    S my_array[3];
    my_array[0].s = &quot;a&quot;;
    my_array[1].s = &quot;b&quot;;
    my_array[2].s = &quot;c&quot;;
    print_all(&amp;my_array[0].s, 3, sizeof(S));
    return 0;
}

This would enable using the same non-templated print_all() function for all kinds of structs.

It appears to work with GCC and MSVC, but is it legal according to the standard?

答案1

得分: 3

指针类型和整数类型（足够大的）之间的转换完全是实现定义的，除非将此类转换获得的整数值从指针值转换回原始指针类型，将重新产生原始指针值。

因此，您需要阅读编译器的文档，该文档_应该_记录了其行为。

然而，如果您使用了(unsigned) char*而不是uintptr_t，那么它将具有未定义行为。具体来说，从强制转换产生的指针仍将指向原始字符串对象或其对象表示。然后，指针算术要么根本不允许，因为指针类型T*中的T与实际指向的对象的类型（std::string）不_相似_，要么因为它将使指针（远）超出std::string对象的对象表示。您只能在数组内进行指针算术，包括最后一个元素之后，只有指针类型与实际数组的类型_相似_时才能进行，（对于此目的，不是数组元素的对象被视为一元素数组的元素）。任何其他指针算术本身都具有未定义行为。

至少从C++17开始，没有办法从指向s成员之一的指针到同一S对象的任何其他成员或同一数组中的任何其他S对象，而不依赖于特定于实现的行为。即使我认为没有任何编译器会这样做，编译器也可以在此假设下进行优化。

因此，不应该期望uintptr_t 变体也能工作。存在一个_指针来源_的概念，意味着指针值携带除其地址以外的信息。没有指定这应该如何与指针的整数表示互动，但请参见例如P2318以获取一些方法。编译器将在某种程度上使用这一点来进行优化。

英文:

Conversions between pointer types and integer types (of sufficient size) are completely implementation-defined, except that converting the integer value obtained by such a conversion from a pointer value back to the original pointer type will reproduce the original pointer value.

So, you'll need to read the documentation of your compiler which should document the behavior.

However, if you had used (unsigned) char* instead of uintptr_t, then it would have undefined behavior. Specifically the pointer resulting from the cast will still point to the original string object or to its object representation. Then pointer arithmetic is either not allowed at all because T in the pointer type T* is not similar to the actual pointed-to object's type (std::string), or because it would increment the pointer (far) beyond the object representation of the std::string object. You can only do pointer arithmetic inside an array, including one-past the last element, and only if the pointer type is similar to that of the actual array. (Objects which are not elements of an array are considered to be elements of a one-element array for this purpose.) Any other pointer arithmetic has in itself undefined behavior.

Since C++17 at least there is no way to get from a pointer to one of the s members to any other other members of the same S object or to any other S object in the same array without relying on implementation-specific behavior. The compiler would be allowed to optimize under this assumption (even if I don't think any do).

So it shouldn't be expected that the uintptr_t variant will work either. There is a concept of pointer provenance that implies pointer values to carry information beyond simply their address. It isn't specified how this should interact with integer representations of pointers, but see e.g. P2318 for some approaches. Compilers will use this to some degree for optimization purposes.

答案2

得分: 1

不管它是否合法（我不确定，但我怀疑不合法），我都不建议使用那种方法。一个更安全且明确合法的选择是使用成员指针，将其应用于数组中的每个struct实例，例如：

#include <vector>
#include <string>
#include <iostream>
using namespace std;
struct S
{
    vector<int> v;
    string s;
    string s2;
    int i;
    char c;
};
void print_all(const S* arr, size_t n, const string S::*member)
{
    while (n > 0)
    {
        cout << arr->*member << "\n";
        ++arr;
        --n;
    }
}
int main(int, char**)
{
    S my_array[3];
    my_array[0].s = "a";
    my_array[1].s = "b";
    my_array[2].s = "c";
    my_array[0].s2 = "d";
    my_array[1].s2 = "e";
    my_array[2].s2 = "f";
    print_all(my_array, 3, &S::s);
    print_all(my_array, 3, &S::s2);
    return 0;
}

在线演示

你可以使用模板泛化这个方法：

#include <vector>
#include <string>
#include <iostream>
using namespace std;
struct S
{
    vector<int> v;
    string s;
    string s2;
    int i;
    char c;
};
template<typename T, typename U>
void print_all(const T* arr, size_t n, const U T::*member)
{
    while (n > 0)
    {
        cout << arr->*member << "\n";
        ++arr;
        --n;
    }
}
template<typename T, typename U, size_t N>
void print_all(const T (&arr)[N], const U T::*member)
{
    print_all(arr, N, member);
}
int main(int, char**)
{
    S my_array[3];
    my_array[0].s = "a";
    my_array[1].s = "b";
    my_array[2].s = "c";
    my_array[0].s2 = "d";
    my_array[1].s2 = "e";
    my_array[2].s2 = "f";
    my_array[0].i = 1;
    my_array[1].i = 2;
    my_array[2].i = 3;
 
    my_array[0].c = 'g';
    my_array[1].c = 'h';
    my_array[2].c = 'i';
    print_all(my_array, &S::s);
    print_all(my_array, &S::s2);
    print_all(my_array, &S::c);
    print_all(my_array, &S::i);
    return 0;
}

在线演示

英文:

Whether or not it is legal (I don't know for sure, but I suspect not), I would not suggest using that approach at all. A much safer and definitely legal option would be to use a pointer-to-member instead, applying it to each struct instance in the array, eg:

#include &lt;vector&gt;
#include &lt;string&gt;
#include &lt;iostream&gt;
using namespace std;
struct S
{
    vector&lt;int&gt; v;
    string s;
    string s2;
    int i;
    char c;
};
void print_all(const S* arr, size_t n, const string S::*member)
{
    while (n &gt; 0)
    {
        cout &lt;&lt; arr-&gt;*member &lt;&lt; &quot;\n&quot;;
        ++arr;
        --n;
    }
}
int main(int, char**)
{
    S my_array[3];
    my_array[0].s = &quot;a&quot;;
    my_array[1].s = &quot;b&quot;;
    my_array[2].s = &quot;c&quot;;
    my_array[0].s2 = &quot;d&quot;;
    my_array[1].s2 = &quot;e&quot;;
    my_array[2].s2 = &quot;f&quot;;
    print_all(my_array, 3, &amp;S::s);
    print_all(my_array, 3, &amp;S::s2);
    return 0;
}

Online Demo

Which you can generalize with templates:

#include &lt;vector&gt;
#include &lt;string&gt;
#include &lt;iostream&gt;
using namespace std;
struct S
{
    vector&lt;int&gt; v;
    string s;
    string s2;
    int i;
    char c;
};
template&lt;typename T, typename U&gt;
void print_all(const T* arr, size_t n, const U T::*member)
{
    while (n &gt; 0)
    {
        cout &lt;&lt; arr-&gt;*member &lt;&lt; &quot;\n&quot;;
        ++arr;
        --n;
    }
}
template&lt;typename T, typename U, size_t N&gt;
void print_all(const T (&amp;arr)[N], const U T::*member)
{
    print_all(arr, N, member);
}
int main(int, char**)
{
    S my_array[3];
    my_array[0].s = &quot;a&quot;;
    my_array[1].s = &quot;b&quot;;
    my_array[2].s = &quot;c&quot;;
    my_array[0].s2 = &quot;d&quot;;
    my_array[1].s2 = &quot;e&quot;;
    my_array[2].s2 = &quot;f&quot;;
    my_array[0].i = 1;
    my_array[1].i = 2;
    my_array[2].i = 3;
 
    my_array[0].c = &#39;g&#39;;
    my_array[1].c = &#39;h&#39;;
    my_array[2].c = &#39;i&#39;;
    print_all(my_array, &amp;S::s);
    print_all(my_array, &amp;S::s2);
    print_all(my_array, &amp;S::c);
    print_all(my_array, &amp;S::i);
    return 0;
}

Online Demo

答案3

得分: 1

这是一个不错的想法和概念。然而，你为了复杂的计算而抛弃了类型安全。错误只能由程序员看到：3和sizeof(S)。在这里绝对不值得。 你可以循环应用对字段的访问。每次都是一个新的访问，但这引入了类型安全。

如果你解析结构体的字段偏移来实现与对齐无关的数据流，可能有一些原因。

而且，是的，这在法律上更或多多少少是合法的。

英文:

It a nice idea and concept. However you abandon type safety for tricky calculation. Errors are only seen by the programmer: 3 and sizeof(S). It is definitely not worth here. You can loop applying an access to a field. Every case a new access, but that introduces type safety.

It would have some reason, if y0u parsed field offsets of a struct to have a alignment independent streaming of data.

And yes it is more or less legal.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在C++结构体数组中，使用指针算术操作遍历/切片相应的数据成员是否合法？

问题

答案1

答案2

答案3

Error: "In included file: 'avr/pgmspace.h' file not found (clang pp_file_not_found)" on Neovim with Coc

如何将模板限制为特定类型

使用C++17从另一个构造函数调用不同参数类型的构造函数

Error with sending message using discord webhook

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。