2023年6月22日 20:21:52go评论73阅读模式

英文:

Using half precision with CuPy

问题

我正在尝试使用cuda_fp16头文件提供的半精度格式，使用CuPy编译一个简单的CUDA内核。

我的内核如下所示：

code = r'''
extern "C" {

#include <cuda_fp16.h>

__global__ void kernel(half * const f1, half * const f2)
{
   if (blockDim.x*blockIdx.x + threadIdx.x < 12 && blockDim.y*blockIdx.y + threadIdx.y < 12)
   {
      const int ctr_0 = blockDim.x*blockIdx.x + threadIdx.x;
      const int ctr_1 = blockDim.y*blockIdx.y + threadIdx.y;
      f1[12*ctr_1 + ctr_0] = f2[12*ctr_1 + ctr_0];
   } 
}

}
'''

我尝试这样编译：

options = ('-I/path/to/cuda/include/', )

mod = cp.RawModule(code=code, options=options, backend="nvrtc", jitify=True)
func = mod.get_function("kernel")

但是，这导致了多个编译器错误。您是否遗漏了明显的问题？

我正在使用cupy-cuda11x和cuda 11.2。

英文:

I am trying to compile a simple CUDA kernel with CuPy using the half precision format provided by the cuda_fp16 header file.

My kernel looks like this:

code = r&#39;&#39;&#39;
extern &quot;C&quot; {

#include &lt;cuda_fp16.h&gt;

__global__ void kernel(half * const f1, half * const f2)
{
   if (blockDim.x*blockIdx.x + threadIdx.x &lt; 12 &amp;&amp; blockDim.y*blockIdx.y + threadIdx.y &lt; 12)
   {
      const int ctr_0 = blockDim.x*blockIdx.x + threadIdx.x;
      const int ctr_1 = blockDim.y*blockIdx.y + threadIdx.y;
      f1[12*ctr_1 + ctr_0] = f2[12*ctr_1 + ctr_0];
   } 
}

}

I try to compile like this:

options = (&#39;-I/path/to/cuda/include/&#39;, )

mod = cp.RawModule(code=code, options=options, backend=&quot;nvrtc&quot;, jitify=True)
func = mod.get_function(&quot;kernel&quot;)

However, this results in several compiler errors:

---------------------------------------------------
--- JIT compile log for /tmp/tmpdiqnmomv/25f306a7612419fcd799b8c90718648c2c1313ca.cubin.cu ---
---------------------------------------------------
cuda_fp16.hpp(266): error: more than one instance of overloaded function &quot;operator++&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(267): error: more than one instance of overloaded function &quot;operator--&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(270): error: more than one instance of overloaded function &quot;operator+&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(271): error: more than one instance of overloaded function &quot;operator-&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(314): error: more than one instance of overloaded function &quot;operator+&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(315): error: more than one instance of overloaded function &quot;operator-&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(316): error: more than one instance of overloaded function &quot;operator*&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(317): error: more than one instance of overloaded function &quot;operator/&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(319): error: more than one instance of overloaded function &quot;operator+=&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(320): error: more than one instance of overloaded function &quot;operator-=&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(321): error: more than one instance of overloaded function &quot;operator*=&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(322): error: more than one instance of overloaded function &quot;operator/=&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(324): error: more than one instance of overloaded function &quot;operator++&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(325): error: more than one instance of overloaded function &quot;operator--&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(326): error: more than one instance of overloaded function &quot;operator++&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(327): error: more than one instance of overloaded function &quot;operator--&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(329): error: more than one instance of overloaded function &quot;operator+&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(330): error: more than one instance of overloaded function &quot;operator-&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(332): error: more than one instance of overloaded function &quot;operator==&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(333): error: more than one instance of overloaded function &quot;operator!=&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(334): error: more than one instance of overloaded function &quot;operator&gt;&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(335): error: more than one instance of overloaded function &quot;operator&lt;&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(336): error: more than one instance of overloaded function &quot;operator&gt;=&quot; has &quot;C&quot; linkage

cuda_fp16.hpp(337): error: more than one instance of overloaded function &quot;operator&lt;=&quot; has &quot;C&quot; linkage

24 errors detected in the compilation of &quot;/tmp/tmpdiqnmomv/25f306a7612419fcd799b8c90718648c2c1313ca.cubin.cu&quot;.

Is there anything obvious I am missing?

I am using cupy-cuda11x and cuda 11.2

答案1

得分: 0

错误消息非常明确 - 编译器告诉您cuda_fp16.hpp包含不受C链接支持的特性。在这种情况下，似乎是函数重载。

我希望像这样的东西应该能够正确编译：

#include <cuda_fp16.h>

extern "C" 
__global__ void kernel(half * const f1, half * const f2)
{
   if (blockDim.x*blockIdx.x + threadIdx.x < 12 && blockDim.y*blockIdx.y + threadIdx.y < 12)
   {
      const int ctr_0 = blockDim.x*blockIdx.x + threadIdx.x;
      const int ctr_1 = blockDim.y*blockIdx.y + threadIdx.y;
      f1[12*ctr_1 + ctr_0] = f2[12*ctr_1 + ctr_0];
   } 
}

也就是说，您的函数具有C链接，但包含的头文件没有。

英文:

The error message is pretty clear -- the compiler is telling you that cuda_fp16.hpp contains features which aren't supported by C linkage. In this case function overloading, it appears.

I would expect something like this should compile correctly:

#include &lt;cuda_fp16.h&gt;

extern &quot;C&quot; 
__global__ void kernel(half * const f1, half * const f2)
{
   if (blockDim.x*blockIdx.x + threadIdx.x &lt; 12 &amp;&amp; blockDim.y*blockIdx.y + threadIdx.y &lt; 12)
   {
      const int ctr_0 = blockDim.x*blockIdx.x + threadIdx.x;
      const int ctr_1 = blockDim.y*blockIdx.y + threadIdx.y;
      f1[12*ctr_1 + ctr_0] = f2[12*ctr_1 + ctr_0];
   } 
}

i.e. your function has C linkage, but the included header does not.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用CuPy进行半精度处理

问题

答案1

如何修改我的代码来解决LeetCode上的问题2？

Numpy与cx_freeze不兼容。

可以自动将股票添加到板块列表中，而不是逐个添加吗？

从多个迭代器的任意状态进行迭代，具有：i1 < i2 < i3 ... < in

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论