英文:
Using half precision with CuPy
问题
我正在尝试使用cuda_fp16
头文件提供的半精度格式,使用CuPy编译一个简单的CUDA内核。
我的内核如下所示:
code = r'''
extern "C" {
#include <cuda_fp16.h>
__global__ void kernel(half * const f1, half * const f2)
{
if (blockDim.x*blockIdx.x + threadIdx.x < 12 && blockDim.y*blockIdx.y + threadIdx.y < 12)
{
const int ctr_0 = blockDim.x*blockIdx.x + threadIdx.x;
const int ctr_1 = blockDim.y*blockIdx.y + threadIdx.y;
f1[12*ctr_1 + ctr_0] = f2[12*ctr_1 + ctr_0];
}
}
}
'''
我尝试这样编译:
options = ('-I/path/to/cuda/include/', )
mod = cp.RawModule(code=code, options=options, backend="nvrtc", jitify=True)
func = mod.get_function("kernel")
但是,这导致了多个编译器错误。您是否遗漏了明显的问题?
我正在使用cupy-cuda11x
和cuda 11.2
。
英文:
I am trying to compile a simple CUDA kernel with CuPy using the half precision format provided by the cuda_fp16
header file.
My kernel looks like this:
code = r'''
extern "C" {
#include <cuda_fp16.h>
__global__ void kernel(half * const f1, half * const f2)
{
if (blockDim.x*blockIdx.x + threadIdx.x < 12 && blockDim.y*blockIdx.y + threadIdx.y < 12)
{
const int ctr_0 = blockDim.x*blockIdx.x + threadIdx.x;
const int ctr_1 = blockDim.y*blockIdx.y + threadIdx.y;
f1[12*ctr_1 + ctr_0] = f2[12*ctr_1 + ctr_0];
}
}
}
I try to compile like this:
options = ('-I/path/to/cuda/include/', )
mod = cp.RawModule(code=code, options=options, backend="nvrtc", jitify=True)
func = mod.get_function("kernel")
However, this results in several compiler errors:
---------------------------------------------------
--- JIT compile log for /tmp/tmpdiqnmomv/25f306a7612419fcd799b8c90718648c2c1313ca.cubin.cu ---
---------------------------------------------------
cuda_fp16.hpp(266): error: more than one instance of overloaded function "operator++" has "C" linkage
cuda_fp16.hpp(267): error: more than one instance of overloaded function "operator--" has "C" linkage
cuda_fp16.hpp(270): error: more than one instance of overloaded function "operator+" has "C" linkage
cuda_fp16.hpp(271): error: more than one instance of overloaded function "operator-" has "C" linkage
cuda_fp16.hpp(314): error: more than one instance of overloaded function "operator+" has "C" linkage
cuda_fp16.hpp(315): error: more than one instance of overloaded function "operator-" has "C" linkage
cuda_fp16.hpp(316): error: more than one instance of overloaded function "operator*" has "C" linkage
cuda_fp16.hpp(317): error: more than one instance of overloaded function "operator/" has "C" linkage
cuda_fp16.hpp(319): error: more than one instance of overloaded function "operator+=" has "C" linkage
cuda_fp16.hpp(320): error: more than one instance of overloaded function "operator-=" has "C" linkage
cuda_fp16.hpp(321): error: more than one instance of overloaded function "operator*=" has "C" linkage
cuda_fp16.hpp(322): error: more than one instance of overloaded function "operator/=" has "C" linkage
cuda_fp16.hpp(324): error: more than one instance of overloaded function "operator++" has "C" linkage
cuda_fp16.hpp(325): error: more than one instance of overloaded function "operator--" has "C" linkage
cuda_fp16.hpp(326): error: more than one instance of overloaded function "operator++" has "C" linkage
cuda_fp16.hpp(327): error: more than one instance of overloaded function "operator--" has "C" linkage
cuda_fp16.hpp(329): error: more than one instance of overloaded function "operator+" has "C" linkage
cuda_fp16.hpp(330): error: more than one instance of overloaded function "operator-" has "C" linkage
cuda_fp16.hpp(332): error: more than one instance of overloaded function "operator==" has "C" linkage
cuda_fp16.hpp(333): error: more than one instance of overloaded function "operator!=" has "C" linkage
cuda_fp16.hpp(334): error: more than one instance of overloaded function "operator>" has "C" linkage
cuda_fp16.hpp(335): error: more than one instance of overloaded function "operator<" has "C" linkage
cuda_fp16.hpp(336): error: more than one instance of overloaded function "operator>=" has "C" linkage
cuda_fp16.hpp(337): error: more than one instance of overloaded function "operator<=" has "C" linkage
24 errors detected in the compilation of "/tmp/tmpdiqnmomv/25f306a7612419fcd799b8c90718648c2c1313ca.cubin.cu".
Is there anything obvious I am missing?
I am using cupy-cuda11x
and cuda 11.2
答案1
得分: 0
错误消息非常明确 - 编译器告诉您cuda_fp16.hpp
包含不受C链接支持的特性。在这种情况下,似乎是函数重载。
我希望像这样的东西应该能够正确编译:
#include <cuda_fp16.h>
extern "C"
__global__ void kernel(half * const f1, half * const f2)
{
if (blockDim.x*blockIdx.x + threadIdx.x < 12 && blockDim.y*blockIdx.y + threadIdx.y < 12)
{
const int ctr_0 = blockDim.x*blockIdx.x + threadIdx.x;
const int ctr_1 = blockDim.y*blockIdx.y + threadIdx.y;
f1[12*ctr_1 + ctr_0] = f2[12*ctr_1 + ctr_0];
}
}
也就是说,您的函数具有C链接,但包含的头文件没有。
英文:
The error message is pretty clear -- the compiler is telling you that cuda_fp16.hpp
contains features which aren't supported by C linkage. In this case function overloading, it appears.
I would expect something like this should compile correctly:
#include <cuda_fp16.h>
extern "C"
__global__ void kernel(half * const f1, half * const f2)
{
if (blockDim.x*blockIdx.x + threadIdx.x < 12 && blockDim.y*blockIdx.y + threadIdx.y < 12)
{
const int ctr_0 = blockDim.x*blockIdx.x + threadIdx.x;
const int ctr_1 = blockDim.y*blockIdx.y + threadIdx.y;
f1[12*ctr_1 + ctr_0] = f2[12*ctr_1 + ctr_0];
}
}
i.e. your function has C linkage, but the included header does not.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论