Segmentation fault when using cusolverSpScsrlsvchol in CUDA for sparse linear problems.

huangapple go评论109阅读模式
英文:

Segmentation fault when using cusolverSpScsrlsvchol in CUDA for sparse linear problems

问题

我正在尝试将一个线性问题移植到CUDA以加速求解时间。我已成功使用cusolverDn来处理GPU上的密集问题。然而,当我尝试使用cusolverSpScsrlsvchol处理稀疏问题时,我一直收到分段错误。

为了调试这个问题,我使用了CUDA计算消毒剂,并收到了以下输出:

  1. $ /c/Programme/NVIDIA\ GPU\ Computing\ Toolkit/CUDA/v11.7/bin/compute-sanitizer.bat --tool memcheck bin/FEMaster_gpu.exe
  2. ========= COMPUTE-SANITIZER
  3. ========= Error: process didn't terminate successfully
  4. ========= Target application returned an error
  5. ========= ERROR SUMMARY: 0 errors
  6. Segmentation fault

我将问题缩小到以下最小代码段:

  1. cusolverSpHandle_t handle_cusolver_sp;
  2. cusparseHandle_t handle_cusparse;
  3. // loading handles
  4. cusolverSpCreate(&handle_cusolver_sp);
  5. cusparseCreate (&handle_cusparse);
  6. // get properties
  7. cudaSetDevice(0);
  8. // create csr arrays on cpu
  9. float host_csr_values[4]{1,1,1,1};
  10. int host_csr_col_id[4]{0,1,2,3};
  11. int host_csr_row_pt[5]{0,1,2,3,4};
  12. float host_rhs [4]{0,3,7,1};
  13. int host_singular [1]{0};
  14. // allocate arrays on the gpu
  15. float* dev_csr_values;
  16. int * dev_csr_col_id;
  17. int * dev_csr_row_pt;
  18. float* dev_rhs;
  19. int * dev_singular;
  20. runtime_assert_cuda(cudaMalloc((void**) &dev_csr_values,4 * sizeof(float)));
  21. runtime_assert_cuda(cudaMalloc((void**) &dev_csr_col_id,4 * sizeof(int )));
  22. runtime_assert_cuda(cudaMalloc((void**) &dev_csr_row_pt,5 * sizeof(int )));
  23. runtime_assert_cuda(cudaMalloc((void**) &dev_rhs ,4 * sizeof(float)));
  24. runtime_assert_cuda(cudaMalloc((void**) &dev_singular ,1 * sizeof(int )));
  25. // move data to gpu
  26. runtime_assert_cuda(cudaMemcpy(dev_csr_values, host_csr_values, 4 * sizeof(float), cudaMemcpyHostToDevice));
  27. runtime_assert_cuda(cudaMemcpy(dev_csr_col_id, host_csr_col_id, 4 * sizeof(int ), cudaMemcpyHostToDevice));
  28. runtime_assert_cuda(cudaMemcpy(dev_csr_row_pt, host_csr_row_pt, 5 * sizeof(int ), cudaMemcpyHostToDevice));
  29. runtime_assert_cuda(cudaMemcpy(dev_rhs , host_rhs , 4 * sizeof(float), cudaMemcpyHostToDevice));
  30. // create matrix descriptor
  31. cusparseMatDescr_t descr;
  32. runtime_assert_cuda(cusparseCreateMatDescr(&descr));
  33. runtime_assert_cuda(cusparseSetMatType (descr, CUSPARSE_MATRIX_TYPE_GENERAL));
  34. runtime_assert_cuda(cusparseSetMatIndexBase(descr, CUSPARSE_INDEX_BASE_ZERO ));
  35. runtime_assert_cuda(cusolverSpScsrlsvchol(handle_cusolver_sp,
  36. 4,
  37. 4,
  38. descr,
  39. dev_csr_values,
  40. dev_csr_row_pt,
  41. dev_csr_col_id,
  42. dev_rhs,
  43. 0, // tolerance
  44. 0, // reorder
  45. dev_rhs,
  46. dev_singular));

我放入稀疏矩阵的值是对角矩阵的值。

为简单起见,我删除了内存释放、输出检索和其他类似调用。代码看起来很简单,但在调用cusolverSpScsrlsvchol时导致分段错误。我已经被这个问题困扰了一天多,但无法弄清楚为什么它不起作用。非常感谢任何帮助!

英文:

I'm trying to port a linear problem to CUDA in order to speed up solving times. I have successfully used cusolverDn to handle dense problems on the GPU. However, when I attempted to apply it to sparse problems using cusolverSpScsrlsvchol, I keep getting a segmentation fault.

To debug the issue, I used the CUDA compute sanitizers and received the following output:

  1. $ /c/Programme/NVIDIA\ GPU\ Computing\ Toolkit/CUDA/v11.7/bin/compute-sanitizer.bat --tool memcheck bin/FEMaster_gpu.exe
  2. ========= COMPUTE-SANITIZER
  3. ========= Error: process didn't terminate successfully
  4. ========= Target application returned an error
  5. ========= ERROR SUMMARY: 0 errors
  6. Segmentation fault

I narrowed down the problem to the following minimal code snippet:

  1. cusolverSpHandle_t handle_cusolver_sp;
  2. cusparseHandle_t handle_cusparse;
  3. // loading handles
  4. cusolverSpCreate(&handle_cusolver_sp);
  5. cusparseCreate (&handle_cusparse);
  6. // get properties
  7. cudaSetDevice(0);
  8. // create csr arrays on cpu
  9. float host_csr_values[4]{1,1,1,1};
  10. int host_csr_col_id[4]{0,1,2,3};
  11. int host_csr_row_pt[5]{0,1,2,3,4};
  12. float host_rhs [4]{0,3,7,1};
  13. int host_singular [1]{0};
  14. // allocate arrays on the gpu
  15. float* dev_csr_values;
  16. int * dev_csr_col_id;
  17. int * dev_csr_row_pt;
  18. float* dev_rhs;
  19. int * dev_singular;
  20. runtime_assert_cuda(cudaMalloc((void**) &dev_csr_values,4 * sizeof(float)));
  21. runtime_assert_cuda(cudaMalloc((void**) &dev_csr_col_id,4 * sizeof(int )));
  22. runtime_assert_cuda(cudaMalloc((void**) &dev_csr_row_pt,5 * sizeof(int )));
  23. runtime_assert_cuda(cudaMalloc((void**) &dev_rhs ,4 * sizeof(float)));
  24. runtime_assert_cuda(cudaMalloc((void**) &dev_singular ,1 * sizeof(int )));
  25. // move data to gpu
  26. runtime_assert_cuda(cudaMemcpy(dev_csr_values, host_csr_values, 4 * sizeof(float), cudaMemcpyHostToDevice));
  27. runtime_assert_cuda(cudaMemcpy(dev_csr_col_id, host_csr_col_id, 4 * sizeof(int ), cudaMemcpyHostToDevice));
  28. runtime_assert_cuda(cudaMemcpy(dev_csr_row_pt, host_csr_row_pt, 5 * sizeof(int ), cudaMemcpyHostToDevice));
  29. runtime_assert_cuda(cudaMemcpy(dev_rhs , host_rhs , 4 * sizeof(float), cudaMemcpyHostToDevice));
  30. // create matrix descriptor
  31. cusparseMatDescr_t descr;
  32. runtime_assert_cuda(cusparseCreateMatDescr(&descr));
  33. runtime_assert_cuda(cusparseSetMatType (descr, CUSPARSE_MATRIX_TYPE_GENERAL));
  34. runtime_assert_cuda(cusparseSetMatIndexBase(descr, CUSPARSE_INDEX_BASE_ZERO ));
  35. runtime_assert_cuda(cusolverSpScsrlsvchol(handle_cusolver_sp,
  36. 4,
  37. 4,
  38. descr,
  39. dev_csr_values,
  40. dev_csr_row_pt,
  41. dev_csr_col_id,
  42. dev_rhs,
  43. 0, // tolerance
  44. 0, // reorder
  45. dev_rhs,
  46. dev_singular));

The values I put in there for the sparse matrix is the one for a diagonal matrix.

I removed the memory deallocation, output retrieval, and other similar calls for simplicity. The code seems straightforward, but it results in a segmentation fault. The issue occurs specifically during the call to cusolverSpScsrlsvchol.

I've been stuck on this problem for over a day and I can't figure out why it's not working. Any help would be greatly appreciated!

答案1

得分: 2

API中指出,奇异性参数应该在主机内存空间中,而不是设备内存中。

英文:

The API states that the singularity parameter is supposed to be in host memory space, not device.

huangapple
  • 本文由 发表于 2023年6月6日 17:31:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76413263.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定