通过指向第一个成员的指针访问未知大小的类成员数组

huangapple go评论68阅读模式
英文:

Access class member array of unknown size through pointer to first member

问题

The code you provided creates a queue-like structure with nodes of different types and accesses their data arrays using type casting. While it seems to work and return the expected value, there are potential issues with type punning and strict aliasing rules in C++. Accessing data in this way can lead to undefined behavior, as the compiler might make certain assumptions about pointer types.

To make it legal and avoid potential issues, you can consider using std::memcpy or similar functions to copy the data from one memory location to another. This approach ensures that the data is properly copied, and it doesn't rely on pointer type casting.

Here's a modified version of your process_queue function using std::memcpy:

void process_queue(node_base* head)
{
    while (head)
    {
        int dataSize = head->size;
        char* source = reinterpret_cast<char*>(head) + sizeof(node_base);
        char* destination = reinterpret_cast<char*>(&static_cast<node_base*>(head)->next);
        
        std::memcpy(destination, source, dataSize);

        head = head->next;
    }
}

By using std::memcpy, you ensure that the data is copied correctly without violating strict aliasing rules. This should make the code safe and well-defined.

英文:

I want to be able to handle pointers to objects with an array member of unknown size, and access that array through a type-erased pointer to their common first member. My current attempt is the following:

#include &lt;cstddef&gt;

struct node_base
{
    node_base* next;
    int size;
};

template &lt;int n&gt;
struct node
{
    node_base base;
    char data[n];

    node() : base{nullptr, n} {
        static_assert(offsetof(node, data) == sizeof(node_base));
    }
};

void process_queue(node_base* head)
{
    while (head)
    {
        for (int i = 0; i &lt; head-&gt;size; ++i)
        {
            *(reinterpret_cast&lt;char*&gt;(reinterpret_cast&lt;char*&gt;(head) + sizeof(node_base)) + i) = i;
        }

        head = head-&gt;next;
    }
}

int main()
{
    node&lt;3&gt; a{};
    node&lt;4&gt; b{};
    node&lt;2&gt; c{};

    c.base.next = &amp;b.base;
    a.base.next = &amp;c.base;

    process_queue(&amp;a.base);

    return a.data[2] + c.data[1];
}

This code builds up a queue-like structure (nodes a,b and c pointing to each other as "a -> c -> b"), and passes a pointer to the first element to process_queue. That function will then traverse the queue and access the node&lt;n&gt;::data array stored directly after the node_base member, and write the values 0...n-1 into its entries.

The challenge is that the nodes have different types, so the queue's next pointer point to the node_base members of the actual nodes, and I need some what to get to the actual data from there.

Although this seems to work (Godbolt) in the sense that it successfully returns expect value of 3, I am not sure whether this is allowed.

Question

Assuming I know by some method that the pointer cur points to the first member of an object with an array of size cur-&gt;size, is it legal to access the elements of the node&lt;n&gt;::data array by means of the code above? If not, can it be made legal without making sizeof(node&lt;n&gt;) larger?

答案1

得分: 1

按照当前标准,因为这里的指针算术操作是未定义行为,所以已经是未定义行为了:

reinterpret_cast<char*>(head) + sizeof(node_base)

head 是一个指向 node_base 对象的指针,而这个对象与该位置的任何 char 对象都不可相互转换。因此,reinterpret_cast<char*>(head) 也将是指向相同 node_base 对象的指针。因此,指针算术是未定义的,因为表达式的指向类型(char)与指向对象的实际类型(node_base)不相似。


但是,你在这里使用 char* 强制转换的目的是改变指针的值。你的意图是获取到 node<n> 对象的对象表示的指针。通常使用 char* 强制转换来访问对象表示,但标准实际上并没有提供这种功能。

提案P1839试图将这个意图纳入标准。根据修订版 P1839R5 中的当前措辞,它仍然不会使你的程序定义良好,有多个原因:

首先,因为提案的限制部分指出,只有 reinterpret_cast<unsigned char*> 才能获得指向对象表示的指针。

即使使用 unsigned char,在提案下仍然存在问题:

你的类恰好是标准布局(standard-layout)。这是使其能够工作的必要条件。如果它们不是标准布局,通常就没有办法从一个成员的指针到另一个成员的指针。

但是,标准布局保证了 node<n> 对象与其第一个成员子对象可以相互转换。因此,在提案中,reinterpret_cast<unsigned char*>(head) 到底会产生指向 node_base 成员的对象表示的第一个元素还是 node<n> 对象的对象表示的指针是一个未决问题。这在提案中被标记为一个待解决的问题。

假设它确实产生了一个指向 node<n> 对象的对象表示的指针,正如你的意图,那么下一个问题将是 reinterpret_cast<unsigned char*>(head) + sizeof(node_base)) + i 是否会成为指向 node<n>char 数组成员的对象表示的指针。我不确定提案对此的意图是什么。

但即使这不是问题,提案仅定义了如何从对象表示中进行读取。在提案下,对其进行写入仍然超出范围,并且仍然是未定义行为。

因此,至少你需要保留外部的 reinterpret_cast<char*> 并将其包装在 std::launder 的调用中,以获得指向 char 对象本身的指针(而不是其对象表示或 node<n> 对象的对象表示)。

英文:

Going strictly by the current standard it is undefined behavior already because the pointer arithmetic here:

reinterpret_cast&lt;char*&gt;(head) + sizeof(node_base)

is undefined. head is a pointer to a node_base object, which is not pointer-interconvertible with any char object at the location. Therefore reinterpret_cast&lt;char*&gt;(head) will also be a pointer to the same node_base object. As a consequence pointer arithmetic is undefined because the pointed-to type of the expression (char) is not similar to the actual type of the pointed-to object (node_base).


However, your intent with the cast to char* here is to change the pointer value. You intent to obtain a pointer to the object representation of the node&lt;n&gt; object. Casts to char* are commonly used to access object representation, but the standard doesn't actually provide for that.

The proposal P1839 attempts to incorporate this intended behavior into the standard. With its current wording in revision P1839R5 it would still not make your program well-defined, for multiple reasons:

First, because only reinterpret_cast&lt;unsigned char*&gt; would be possible to obtain a pointer to the object representation, as noted in the limitations section of the proposal.

Even with unsigned char, there is still issues under the proposal:

Your classes happen to be standard-layout. That's a necessary condition for this to work at all. If they weren't standard-layout, then there generally wouldn't be any way to get from a pointer to one member to a pointer to another member.

But being standard-layout guarantees that the node&lt;n&gt; object is pointer-interconvertible with its first member subobject. As a consequence, under the proposal it is left open whether reinterpret_cast&lt;unsigned char*&gt;(head) will produce a pointer to the first element of the object representation of the node_base member or of the node&lt;n&gt; object. This is noted as an open issue in the proposal.

Assuming it did however produce a pointer to the object representation of the node&lt;n&gt; object as you intent, then the next question would be whether reinterpret_cast&lt;unsigned char*&gt;(head) + sizeof(node_base)) + i would be pointer into the object representation of the char array member of node&lt;n&gt; as well. I am not sure know what the proposal intents for this.

But even if that wasn't an issue, the proposal defines only how it is possible to read from the object representation. Writing to it is out-of-scope and still UB under the proposal.

So at the very least you would need to keep the outer reinterpret_cast&lt;char*&gt; and wrap it in a call to std::launder in order to obtain a pointer to the char object itself (rather than its object representation or the object representation of the node&lt;n&gt; object).

huangapple
  • 本文由 发表于 2023年5月25日 16:17:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76330212.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定