如何在SYCL 2020 / DPC++中实现自定义的四维数组查看器/包装器?

huangapple go评论54阅读模式
英文:

How do I Implement a Custom 4-Dimensional Array Viewer/Wrapper in SYCL 2020 / DPC++?

问题

在传统的C++中,可以通过以下方式创建一个多维的“viewer”或“wrapper”来访问线性内存中的1D缓冲区:(1) 定义一个自定义的ArrayWrapper类,(2) 重载()[]运算符作为它的“访问器”,在这个成员函数内进行地址计算,并且(3) 返回对值的引用。因此,可以通过语法糖array(a, b, c, d) = val来访问4D数组。这提高了代码的可读性,也将viewer与数组的实际内存布局解耦。

然而,在DPC++ / SYCL 2020内核中,无法使用这个wrapper。

这是由于使用了C++ lambda函数,它默认会将其作用域外的变量作为const变量捕获,导致该错误。在传统的C++中,可以通过以下两种方式解决这个问题:要么显式要求lambda函数捕获引用 Q.single_task([&array]() {},要么将lambda声明为可变函数 Q.single_task([=]() mutable {}。然而,SYCL似乎不支持这两种用法,并且被DPC++编译器禁止使用。

在DPC++ / SYCL 2020中是否有一种方法来实现相同的语法糖array(a, b, c, d) = val?我注意到,在SYCL中,内存访问是通过称为缓冲区(buffers)和访问器(accessors)的两个抽象来提供的。不幸的是,它们只支持1D、2D或3D数组,不支持更高维度。如何定义一个方便的包装器来访问高维数组?

英文:

In conventional C++, it's possible to create a multi-dimensional "viewer" or "wrapper" to a 1D buffer in linear memory by (1) defining a custom ArrayWrapper class, (2) overriding the () or [] operator as its "accessor", doing the address calculation inside this member function, and (3) returning a reference to the value. Thus, a 4D array can be accessed via the syntactic sugar array(a, b, c, d) = val. This improves code readability, and also decouples the viewer from the actual memory layout of the array.

#include <iostream>
#include <cstdlib>

template <typename T>
class ArrayWrapper
{
public:
        ArrayWrapper(T *buf) : array(buf) {};

        inline T& operator() (size_t a, size_t b, size_t c, size_t d)
        {
                return array[a + b + c + d];
        }

        const inline T& operator() (size_t a, size_t b, size_t c, size_t d) const
        {
                return array[a + b + c + d];
        }
        T *array;
};

int main(void)
{
        int *buf = (int *) malloc(sizeof(int) * 100);
        ArrayWrapper<int> array(buf);
        array(1, 2, 3, 4) = 42;

        std::cout << array(1, 2, 3, 4) << std::endl;
}

However, this wrapper is not usable in a DPC++ / SYCL 2020 kernel.

int main(void)
{
        sycl::queue Q;
        auto buf = sycl::malloc_shared<int>(20, Q);
        ArrayWrapper<int> array(buf);

        Q.single_task([=]() {
                array(1, 2, 3, 4) = 42;
        });
        Q.wait();

        std::cout << array(1, 2, 3, 4) << std::endl;
}

Compiling this function with Intel DPC++ compiler returns the following error:

question-sycl.cpp:37:21: error: expression is not assignable
                array(1, 2, 3, 4) = 42;
                ~~~~~~~~~~~~~~~~~ ^
1 error generated.
make: *** [Makefile:8: question-sycl.elf] Error 1

This is the result due to the use of C++ lambda function, which "captures" variable outside its scope as const variables by default. In conventional C++, this can be solved by either explicitly asking the lambda function to capture a reference Q.single_task([&array]() {}, or declaring the lambda as a mutable function Q.single_task([=]() mutable {}. However, both usages appear to be unsupported in SYCL and prohibited by the DPC++ compiler.

Is there a way to implement the same syntactic sugar array(a, b, c, d) = val in DPC++ / SYCL 2020? I noticed that memory access in SYCL is provided by two abstractions called buffers and accessors. Unfortunately, they only support 1D, 2D, or 3D arrays, not higher dimensions. What is the best way to define a convenient wrapper for accessing high-dimension arrays?

答案1

得分: 1

根据您的要求,以下是翻译好的内容:

正如您所说,在SYCL中,捕获的对象是不可变的,这是有充分理由的:目前不清楚是否所有工作项都应该访问内核参数的共享对象,还是每个工作项都应该有自己的副本 - 最终取决于后端/硬件以及它们想要执行的操作。因此,我们决定所有SYCL内核参数都应该是不可变的。

您有两个选项(您已经找到了其中一个):

  1. 只需复制内核参数:
 Q.single_task([=]() {
   ArrayWrapper<int> a_kernel = array;
   a_kernel(1, 2, 3, 4) = 42;
 });
  1. 您已经找到了这个选项:如果这不可接受,请考虑您的包装器的constness模型。 SYCL内核参数的不可变性实际上关心的是内核参数内部的数据是否发生变化。在您的情况下,您的包装器仅提供了一个视图 - 包装器对象本身不会更改。因此,将const重载返回非const引用可能是可以接受的,这将解决您的问题,并且可能更适合您的用例。请注意,真正的const视图仍然可以通过使用const T类型来实例化您的包装器来表示。如果您愿意,甚至可以实现从ArrayWrapper<T>ArrayWrapper<const T>的转换。正如您所说,这就是sycl::accessor对象的实现方式。

我想指出的是,您可能不必实现自己的高维数组包装器。您应该能够使用mdspan,它已经提供了这个功能,并使用SYCL USM指针进行初始化。我不清楚DPC++,但我知道这在hipSYCL / Open SYCL中可以工作。

英文:

As you say, captured objects in SYCL are not mutable, and for good reason: It is very unclear whether all work items should access a shared object of kernel arguments, or whether each work item should have its own copy -- ultimately this depends strongly on the backend / hardware and what they want to do. So we decided that all SYCL kernel arguments should be immutable.

You have two options (and you have already found one):

  1. Just copy the kernel argument:
 Q.single_task([=]() {
   ArrayWrapper<int> a_kernel = array;
   a_kernel(1, 2, 3, 4) = 42;
 });

  1. You already found this one: If this is not acceptable, think about the constness-model of your wrapper. What the immutability property of SYCL kernel arguments really cares about is whether data within the kernel arguments changes. In your case, your wrapper only provides a view -- the wrapper object itself does not change. As such, it might be acceptable to have the const overload return a non-const reference which would solve your issue and might be more appropriate for your use case. Note that true const views could still be represented by instantiating your wrapper with const T type. You could even implement conversions from ArrayWrapper<T> to ArrayWrapper<const T> if you like. As you say, this is how sycl::accessor objects are implemented.

I want to point out that you might not have to implement your own high-dimensional array wrapper. You should be able to use mdspan which already provides this functionality, and initialize it with a SYCL USM pointer. I have no idea about DPC++, but I know that this work in hipSYCL / Open SYCL.

答案2

得分: 0

Mutable closures are currently not supported by DPC++
The solution is capturing a pointer to array:
auto f = array = &array {
(*array)(1, 2, 3, 4) = 42;
};
sycl::queue Q is not available to me, therefore I have simplified the example by removing the irrelevant Q.

英文:

> Mutable closures are currently not supported by DPC++

The solution is capturing a pointer to array:

    auto f = [array = &array]() {
        (*array)(1, 2, 3, 4) = 42;
    };

sycl::queue Q is not available to me, therefore I have simplified the example by removing the irrelevant Q.

答案3

得分: 0

Update: 这里的问题在于 SYCL 内核期望 arrayoperator() 是一个 const 成员函数,但在这里提供了两个定义,一个是返回引用的非 const 成员函数,另一个是返回 const 值的 const 成员函数。因此,函数的 const 版本被匹配,它返回一个 const 值。按照定义,这是不可修改的。

因此,解决方法是删除 operator() 的 const 返回版本:

const inline T& operator()(size_t a, size_t b, size_t c, size_t d) const
{
    return array[a + b + c + d];
}

并将非 const 返回版本的成员函数更改为 const 成员函数,如下所示:

inline T& operator()(size_t a, size_t b, size_t c, size_t d) const
{
    return array[a + b + c + d];
}

问题解决。

我注意到这也是 SYCL 访问器本身的实现方式

当将 ArrayWrapper 传递给 lambda 函数中的其他函数时(即计算内核),参数必须声明为 const,尽管它们实际上是可修改的。这很令人困惑,但在 SYCL / DPC++ 编程中是一种预期的用法形式。因此,这些包装器不应该被称为 Array,而应该是 ArrayWrapperArrayAccessor,以突出只有包装器本身是 const - 数据不是。


Outdated answer: 273K 的答案给了我一个关于通过捕获指针来可能解决的提示,如下所示:

Q.single_task([array = &array]() {
    (*ptr)(1, 2, 3, 4) = 42;
});
Q.wait();

不幸的是,它通过了编译器检查,但所有内存写入都没有效果,对主机是不可见的。我猜这是由于对主机和设备之间共享内存的 SYCL 假设的微小违反引起的未定义行为。

但是,在 lambda 函数内部获取引用的地址可以工作:

Q.single_task([=]() {
    auto ptr = &array;
    (*ptr)(1, 2, 3, 4) = 42;
});
Q.wait();

然而,这种解决方法不是可靠的。它取决于 DPC++ 编译器没有意识到 auto ptr 缺少 const(实际上,编译器拒绝 ArrayWrapper<int>* ptr,但不拒绝 auto ptr)。

英文:

Update: The problem here is that the SYCL kernel expects a const member function of operator() for array, but here, two definitions were provided, one is a non-const member function that returns a reference, another is a const member function that returns a const value. As a result, the const version of the function is matched which returns a const value. This cannot be modified, by definition.

Thus, the solution is to remove the const-return version of operator():

const inline T&amp; operator() (size_t a, size_t b, size_t c, size_t d) const
{
        return array[a + b + c + d];
}

And changing the non-const-return version of the member function to be a const member function, as in:

inline T&amp; operator() (size_t a, size_t b, size_t c, size_t d) const
{
    return array[a + b + c + d];
}

Problem solved.

I noticed that this is also how SYCL accessors themselves are implemented.

When passing a ArrayWrapper into other functions within the lambda function (which is the compute kernel), the argument must be declared const, when they're in fact modifiable. This is confusing but is a expected form of usage in SYCL / DPC++ programming. Thus, these wrappers should not be called Array, but ArrayWrapper or ArrayAccessor to highlight the fact that only the wrapper itself is const - the data is not.


Outdated answer: 273K's answer offered me a hint on the possible workaround of capturing a pointer, as in:

Q.single_task([array = &amp;array]() {
        (*ptr)(1, 2, 3, 4) = 42;
});
Q.wait();

Unfortunately, it passes the compiler check but all memory writes have no effect and are invisible to the host. I guess it's an undefined behavior caused by a subtle violation of SYCL's assumption on shared memory between host and device.

But, taking the address of the reference inside the lambda function works:

Q.single_task([=]() {
        auto ptr = &amp;array;
        (*ptr)(1, 2, 3, 4) = 42;
});
Q.wait();

However, this workaround is not reliable. It depends on the fact that DPC++ compiler doesn't realize the missing constness of auto ptr (in fact, the compiler rejects ArrayWrapper&lt;int&gt;* ptr but not auto ptr).

huangapple
  • 本文由 发表于 2023年6月6日 11:56:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76411362.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定