2023年6月6日 17:31:59go评论109阅读模式

英文:

Segmentation fault when using cusolverSpScsrlsvchol in CUDA for sparse linear problems

问题

我正在尝试将一个线性问题移植到CUDA以加速求解时间。我已成功使用cusolverDn来处理GPU上的密集问题。然而，当我尝试使用cusolverSpScsrlsvchol处理稀疏问题时，我一直收到分段错误。

为了调试这个问题，我使用了CUDA计算消毒剂，并收到了以下输出：

$ /c/Programme/NVIDIA\ GPU\ Computing\ Toolkit/CUDA/v11.7/bin/compute-sanitizer.bat --tool memcheck bin/FEMaster_gpu.exe
========= COMPUTE-SANITIZER
========= Error: process didn't terminate successfully
========= Target application returned an error
========= ERROR SUMMARY: 0 errors
Segmentation fault

我将问题缩小到以下最小代码段：

cusolverSpHandle_t handle_cusolver_sp;
cusparseHandle_t   handle_cusparse;
// loading handles
cusolverSpCreate(&amp;handle_cusolver_sp);
cusparseCreate  (&amp;handle_cusparse);
// get properties
cudaSetDevice(0);
// create csr arrays on cpu
float host_csr_values[4]{1,1,1,1};
int   host_csr_col_id[4]{0,1,2,3};
int   host_csr_row_pt[5]{0,1,2,3,4};
float host_rhs       [4]{0,3,7,1};
int   host_singular  [1]{0};
// allocate arrays on the gpu
float* dev_csr_values;
int  * dev_csr_col_id;
int  * dev_csr_row_pt;
float* dev_rhs;
int  * dev_singular;
runtime_assert_cuda(cudaMalloc((void**) &amp;dev_csr_values,4 * sizeof(float)));
runtime_assert_cuda(cudaMalloc((void**) &amp;dev_csr_col_id,4 * sizeof(int  )));
runtime_assert_cuda(cudaMalloc((void**) &amp;dev_csr_row_pt,5 * sizeof(int  )));
runtime_assert_cuda(cudaMalloc((void**) &amp;dev_rhs       ,4 * sizeof(float)));
runtime_assert_cuda(cudaMalloc((void**) &amp;dev_singular  ,1 * sizeof(int  )));
// move data to gpu
runtime_assert_cuda(cudaMemcpy(dev_csr_values, host_csr_values, 4 * sizeof(float), cudaMemcpyHostToDevice));
runtime_assert_cuda(cudaMemcpy(dev_csr_col_id, host_csr_col_id, 4 * sizeof(int  ), cudaMemcpyHostToDevice));
runtime_assert_cuda(cudaMemcpy(dev_csr_row_pt, host_csr_row_pt, 5 * sizeof(int  ), cudaMemcpyHostToDevice));
runtime_assert_cuda(cudaMemcpy(dev_rhs       , host_rhs       , 4 * sizeof(float), cudaMemcpyHostToDevice));
// create matrix descriptor
cusparseMatDescr_t descr;
runtime_assert_cuda(cusparseCreateMatDescr(&amp;descr));
runtime_assert_cuda(cusparseSetMatType     (descr, CUSPARSE_MATRIX_TYPE_GENERAL));
runtime_assert_cuda(cusparseSetMatIndexBase(descr, CUSPARSE_INDEX_BASE_ZERO    ));
runtime_assert_cuda(cusolverSpScsrlsvchol(handle_cusolver_sp,
                                          4,
                                          4,
                                          descr,
                                          dev_csr_values,
                                          dev_csr_row_pt,
                                          dev_csr_col_id,
                                          dev_rhs,
                                          0,    // tolerance
                                          0,    // reorder
                                          dev_rhs,
                                          dev_singular));

我放入稀疏矩阵的值是对角矩阵的值。

为简单起见，我删除了内存释放、输出检索和其他类似调用。代码看起来很简单，但在调用cusolverSpScsrlsvchol时导致分段错误。我已经被这个问题困扰了一天多，但无法弄清楚为什么它不起作用。非常感谢任何帮助！

英文:

I'm trying to port a linear problem to CUDA in order to speed up solving times. I have successfully used cusolverDn to handle dense problems on the GPU. However, when I attempted to apply it to sparse problems using cusolverSpScsrlsvchol, I keep getting a segmentation fault.

To debug the issue, I used the CUDA compute sanitizers and received the following output:

$ /c/Programme/NVIDIA\ GPU\ Computing\ Toolkit/CUDA/v11.7/bin/compute-sanitizer.bat --tool memcheck bin/FEMaster_gpu.exe
========= COMPUTE-SANITIZER
========= Error: process didn&#39;t terminate successfully
========= Target application returned an error
========= ERROR SUMMARY: 0 errors
Segmentation fault

I narrowed down the problem to the following minimal code snippet:

cusolverSpHandle_t handle_cusolver_sp;
    cusparseHandle_t   handle_cusparse;
    // loading handles
    cusolverSpCreate(&amp;handle_cusolver_sp);
    cusparseCreate  (&amp;handle_cusparse);
    // get properties
    cudaSetDevice(0);
    // create csr arrays on cpu
    float host_csr_values[4]{1,1,1,1};
    int   host_csr_col_id[4]{0,1,2,3};
    int   host_csr_row_pt[5]{0,1,2,3,4};
    float host_rhs       [4]{0,3,7,1};
    int   host_singular  [1]{0};
    // allocate arrays on the gpu
    float* dev_csr_values;
    int  * dev_csr_col_id;
    int  * dev_csr_row_pt;
    float* dev_rhs;
    int  * dev_singular;
    runtime_assert_cuda(cudaMalloc((void**) &amp;dev_csr_values,4 * sizeof(float)));
    runtime_assert_cuda(cudaMalloc((void**) &amp;dev_csr_col_id,4 * sizeof(int  )));
    runtime_assert_cuda(cudaMalloc((void**) &amp;dev_csr_row_pt,5 * sizeof(int  )));
    runtime_assert_cuda(cudaMalloc((void**) &amp;dev_rhs       ,4 * sizeof(float)));
    runtime_assert_cuda(cudaMalloc((void**) &amp;dev_singular  ,1 * sizeof(int  )));
    // move data to gpu
    runtime_assert_cuda(cudaMemcpy(dev_csr_values, host_csr_values, 4 * sizeof(float), cudaMemcpyHostToDevice));
    runtime_assert_cuda(cudaMemcpy(dev_csr_col_id, host_csr_col_id, 4 * sizeof(int  ), cudaMemcpyHostToDevice));
    runtime_assert_cuda(cudaMemcpy(dev_csr_row_pt, host_csr_row_pt, 5 * sizeof(int  ), cudaMemcpyHostToDevice));
    runtime_assert_cuda(cudaMemcpy(dev_rhs       , host_rhs       , 4 * sizeof(float), cudaMemcpyHostToDevice));
    // create matrix descriptor
    cusparseMatDescr_t descr;
    runtime_assert_cuda(cusparseCreateMatDescr(&amp;descr));
    runtime_assert_cuda(cusparseSetMatType     (descr, CUSPARSE_MATRIX_TYPE_GENERAL));
    runtime_assert_cuda(cusparseSetMatIndexBase(descr, CUSPARSE_INDEX_BASE_ZERO    ));
    runtime_assert_cuda(cusolverSpScsrlsvchol(handle_cusolver_sp,
                                              4,
                                              4,
                                              descr,
                                              dev_csr_values,
                                              dev_csr_row_pt,
                                              dev_csr_col_id,
                                              dev_rhs,
                                              0,    // tolerance
                                              0,    // reorder
                                              dev_rhs,
                                              dev_singular));

The values I put in there for the sparse matrix is the one for a diagonal matrix.

I removed the memory deallocation, output retrieval, and other similar calls for simplicity. The code seems straightforward, but it results in a segmentation fault. The issue occurs specifically during the call to cusolverSpScsrlsvchol.

I've been stuck on this problem for over a day and I can't figure out why it's not working. Any help would be greatly appreciated!

答案1

得分: 2

API中指出，奇异性参数应该在主机内存空间中，而不是设备内存中。

英文:

The API states that the singularity parameter is supposed to be in host memory space, not device.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Segmentation fault when using cusolverSpScsrlsvchol in CUDA for sparse linear problems.

问题

答案1

QText2DEntity 不渲染

Develop a custom Rcpp function to be used with terra::focalCpp to calculate the percent of a specific value within a moving window

是在构造函数中分配一个std::string_view类型一个好主意吗？

为什么嵌套的数组对象会阻止提供存储？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。