OpenMP Offload Error during linking with gcc with nvptx-none: unresolved symbol _fputwc_r

huangapple go评论90阅读模式
英文:

OpenMP Offload Error during linking with gcc with nvptx-none: unresolved symbol _fputwc_r

问题

I am here to assist with the translation. Below is the translated content:

我正试图使用OpenMP卸载在Nvidia GPU上编译一个简单的测试问题。我正在使用带有nvptx-none目标的gcc。我已经安装了gcc+nvptx包(或者自己编译了gcc-13并使用nvptx-tools,结果相同)。

在链接时,我遇到了错误:

未解析的符号_fputwc_r
collect2: error: ld返回1个退出状态
mkoffload: 致命错误:x86_64-pc-linux-gnu-accel-nvptx-none-gcc返回1个退出状态
编译已终止。
lto-wrapper: 致命错误:/path/to/spack/opt/spack/linux-centos8-x86_64_v3/gcc-13.0.0/gcc-12.2.0-6olbpwbs53cquwnpsvrmuxprmaofwjtk/libexec/gcc/x86_64-pc-linux-gnu/12.2.0//accel/nvptx-none/mkoffload返回1个退出状态
编译已终止。
/usr/bin/ld: 错误:lto-wrapper失败

按照推荐的方法,使用-fno-stack-protector进行编译并不能解决问题,如在这里或这里所建议的。而使用-fno-lto则可以,但卸载就不起作用。不同的优化标志也没有影响。

似乎使用的是系统安装的ld。Spack安装提供了另一个ldspack/linux-centos8-x86_64_v3/gcc-13.0.0/gcc-12.2.0-6olbpwbs53cquwnpsvrmuxprmaofwjtk/nvptx-none,但Spack通常不会将其添加到PATH,我认为有很好的理由,因为包括它会导致以下问题:

as: 无法识别的选项 '--64'
nvptx-as: 开头没有.version指令的文件'/tmp/cc9YfveM.s'

这是否是链接器的问题,还是其他什么问题?只有在包含并行for循环时才会出现此问题,只设置#pragma omp target不会出现问题。设备实际上被识别了,并且根据OpenMP,此pragma内的代码在设备上运行,只要没有出现并行区域,否则将会产生上述错误。

附加信息:
系统为Rocky Linux release 8.7 (Green Obsidian)
我正在执行的测试程序基于OpenMP测试程序。其完整代码如下:

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
void saxpy(float a, float* x, float* y, int sz) {
#pragma omp target teams distribute parallel for simd \
   num_teams(3) map(to:x[0:sz]) map(tofrom:y[0:sz])
   for (int i = 0; i < sz; i++) {
      if (omp_is_initial_device()) {
         printf("Running on host\n");    
      } else {
         int nthreads= omp_get_num_threads();
         int nteams= omp_get_num_teams(); 
         printf("Running on device with %d teams (fixed) in total and %d threads in each team\n",nteams,nthreads);
      }
      fprintf(stdout, "Thread %d %i\n", omp_get_thread_num(), i );
      y[i] = a * x[i] + y[i];
   }
}
int main(int argc, char** argv) {
   float a = 2.0;
   int sz = 16;
   float *x = calloc( sz, sizeof *x );
   float *y = calloc( sz, sizeof *y );
   //Set values
   int num_devices = omp_get_num_devices();
   printf("Number of available devices %d\n", num_devices);
   saxpy( a, x, y, sz );
   return 0;
}

我尝试使用以下命令进行编译:

gcc -O0 -fopenmp -foffload=nvptx-none -o mintest mintest.c

或使用上述提到的标志。

英文:

I am trying to compile a simple test problem using OpenMP offloading for an Nvidia GPU. I am using gcc with the nvptx-none target. I have installed the gcc+nvptx package with spack (or compiled gcc-13 with nvptx-tools myself, the results are the same).
During linking, I get the error:

unresolved symbol _fputwc_r
collect2: error: ld returned 1 exit status
mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
compilation terminated.
lto-wrapper: fatal error: /path/to/spack/opt/spack/linux-centos8-x86_64_v3/gcc-13.0.0/gcc-12.2.0-6olbpwbs53cquwnpsvrmuxprmaofwjtk/libexec/gcc/x86_64-pc-linux-gnu/12.2.0//accel/nvptx-none/mkoffload returned 1 exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed

Compiling with -fno-stack-protector, as recommended e.g. here or
here, does not alleviate the problem. -fno-lto does, but then the offloading doesn't work. Different optimization flags make no difference.

The ld that is used is the system installation it seems. The spack installation provides another ld in spack/linux-centos8-x86_64_v3/gcc-13.0.0/gcc-12.2.0-6olbpwbs53cquwnpsvrmuxprmaofwjtk/nvptx-none, but spack doesn't add this to the PATH normally. I guess with good reason, because including it leads to

as: unrecognized option '--64'
nvptx-as: missing .version directive at start of file '/tmp/cc9YfveM.s'``

Is this a problem with the linker, or something else? The problem only occurs when actually including a parallel for loop, just setting #pragma omp target does not. The device is actually recognized, and code inside this pragma runs on the device according to OpenMP, as long as there is no parallel region present, which would produce above error.

Additional information:
The system is Rocky Linux release 8.7 (Green Obsidian)
The test program I am executing is based on the OpenMP test programs. It's full code is:

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
void saxpy(float a, float* x, float* y, int sz) {
#pragma omp target teams distribute parallel for simd \
   num_teams(3) map(to:x[0:sz]) map(tofrom:y[0:sz])
   for (int i = 0; i < sz; i++) {
      if (omp_is_initial_device()) {
         printf("Running on host\n");    
      } else {
         int nthreads= omp_get_num_threads();
         int nteams= omp_get_num_teams(); 
         printf("Running on device with %d teams (fixed) in total and %d threads in each team\n",nteams,nthreads);
      }
      fprintf(stdout, "Thread %d %i\n", omp_get_thread_num(), i );
      y[i] = a * x[i] + y[i];
   }
}
int main(int argc, char** argv) {
   float a = 2.0;
   int sz = 16;
   float *x = calloc( sz, sizeof *x );
   float *y = calloc( sz, sizeof *y );
   //Set values
   int num_devices = omp_get_num_devices();
   printf("Number of available devices %d\n", num_devices);
   saxpy( a, x, y, sz );
   return 0;
}

I try to compile it with

gcc -O0 -fopenmp -foffload=nvptx-none -o mintest mintest.c

or with the flags mentioned above.

答案1

得分: 1

I'll provide a translation of the code-related part as requested:

我猜问题在于GCC无法处理在运行在GPU上的代码区域中的printf。通常情况下,GPU不擅长处理任何I/O操作,因此在离线代码区域中应避免调用printfreadwrite等函数。

如果你想检测代码是在GPU设备上运行还是在主机上运行,你可以使用以下模式:

void test_on_gpu(void) {
    int on_device = 0;
    #pragma omp target teams map(from:on_device)
    {
        #pragma omp parallel
        {
            #pragma omp master
            {
                if (0 == omp_get_team_num()) {
                    on_device = !omp_is_initial_device();
                }
            }
        }
    }
    printf("on GPU: %s\n", on_device ? "yes" : "no");
}

这段代码的作用是:

  • 切换到GPU设备(target
  • 在第一个OpenMP团队和并行区域中选择一个线程(主线程,master
  • 确定代码是否在GPU上执行
  • 通过map(from:on_device)返回测试结果。
英文:

I guess the issue is that GCC cannot deal with the printf within the code region that is running on the GPU. GPUs typically are not good at any form of I/O happening and so you should avoid calling things like printf, read, write, etc. when within an offloaded code region.

If you want to detect if the code was running on the GPU device or the host, then you can use a pattern like this:

void test_on_gpu(void) {
    int on_device = 0;
    #pragma omp target teams map(from:on_device)
    {
        #pragma omp parallel
        {
            #pragma omp master
            {
                if (0 == omp_get_team_num()) {
                    on_device = !omp_is_initial_device()
                }
            }
        }
    }
    printf("on GPU: %s\n", on_device ? "yes" : "no");
}

What the code does is:

  • transition to the GPU device (target)
  • take one thread (the primary thread, master) in the first OpenMP team and the parallel region there
  • determine if execution happened on the GPU
  • return the test result via map(from:on_device)

huangapple
  • 本文由 发表于 2023年4月6日 21:24:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75950063.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定