英文:
OpenMP task reduction with target offloading segfaults when running single threaded
问题
我在使用带目标卸载的OpenMP时发现,当将OpenMP限制为单个线程时,我的应用程序会出现段错误。我可以将问题简化为以下代码片段:
#include <omp.h>
int main(){
int res = 0;
#pragma omp parallel num_threads(1)
{
#pragma omp single
{
#pragma omp taskgroup task_reduction(+:res)
{
#pragma omp target in_reduction(+:res) nowait
{
res++;
}
}
}
}
}
使用以下命令编译:
clang++ -fopenmp -fopenmp-targets=nvptx64 --offload-arch=sm_61 -O0 main.cpp
使用clang 17.0.0
和cuda 12.1
编译,在一台搭载12700k/1080Ti的Ubuntu 22.04机器上运行。当将num_threads
设置为1
时,会导致段错误,但如果使用多于一个线程(例如num_threads(2)
)或者在目标任务上不指定nowait
,以便在目标区域末尾同步,那么就可以正常工作。
根据我的理解,即使只有一个线程,这应该也可以正常工作。
英文:
I was using OpenMP with target offload and found that my application segfaults when limiting OpenMP to a single thread.
I could boil it down to the following snippet:
#include <omp.h>
int main(){
int res = 0;
#pragma omp parallel num_threads(1)
{
#pragma omp single
{
#pragma omp taskgroup task_reduction(+:res)
{
#pragma omp target in_reduction(+:res) nowait
{
res++;
}
}
}
}
}
Compiled with
clang++ -fopenmp -fopenmp-targets=nvptx64 --offload-arch=sm_61 -O0 main.cpp
using clang 17.0.0
and cuda 12.1
, ran on a Ubuntu 22.04 machine with a 12700k/1080Ti.
This segfaults when num_threads
is set to 1
, but works fine with more than one thread (e.g. num_threads(2)
) or when not specifying nowait
on the target task, so that it synchronizes at the end of the target region.
From my understanding, this should work just fine even with a single thread.
答案1
得分: 0
根据邮件列表/GitHub问题,在撰写此回答时,clang 中对 in_reduction
的支持是不完整/缺失的。
英文:
According to mailing list/github issues, as of writing this, support for in_reduction
in clang is incomplete/missing.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论