Device-wide synchronization in SYCL on NVIDIA GPUs

huangapple go评论51阅读模式
英文:

Device-wide synchronization in SYCL on NVIDIA GPUs

问题

问题
是否有一种方法可以等待特定设备上的所有命令,而不必明确调用每个队列上的wait()

英文:

Context
I'm porting a complex CUDA application to SYCL which uses multiple cudaStream to launch the kernels. In addition, it also uses the default Stream in some cases, forcing a device-wide synchronization.

Problem
Cuda Streams can be mapped quite easily to in order SYCL Queues, however when encountering a device-wide syncronization point (i.e. cudaDeviceSyncronize()), I must explicitly wait on all the queues as queue::wait() waits just on the commands submitted to that queue.

Question
Is there a way to wait on all the commands for a specific device, without having to explicitly call wait() on every queue?

答案1

得分: 1

  1. 通常情况下,有两种方法可以模仿SYCL中的这种行为。

  2. 您可以等待每个队列,就像您建议的那样。

  3. 您可以等待组成您的CUDA流的所有事件,使用event::wait(const std::vector<event> &)或event::wait_and_throw(const std::vector<event> &)。

前者恰恰是您所建议的,但当然这会等待整个队列清空。第二个选项允许您仅等待事件完成,而不必等待整个队列。

无论哪种情况,您都需要进行一些记录以确保在继续执行算法之前等待您希望完成的每个项目。

正如Sri提到的,您可以使用SYCLomatic,SYCLomatic翻译此代码的方式是创建一个循环遍历所有队列并执行与1中相同的等待的函数。

希望这有所帮助,但抽象稍有不同,不是一行代码的解决方案 Device-wide synchronization in SYCL on NVIDIA GPUs

英文:

In general there are two ways you might be able to mimic this behavior I SYCL.

  1. You can wait on every queue as you suggest
  2. You can wait on all the events that comprise your CUDA stream using event::wait(const std::vector<event> &) or event::wait_and_throw(const std::vector<event> &)

The former is precisely what you suggest, but of course then you are waiting on the whole queue to empty. The second option allows you to wait just for the events to complete without waiting on the whole queue.

In either case though, you do have to do some book keeping to ensure that you are waiting on each item you expect to complete before proceeding with your algorithm.

As Sri mentioned, you can use SYCLomatic and they way that SYCLomatic translates this code is to create a function that loops over all the queues and performs the waits as in 1 above.

Hopefully this helps, wish it was a one liner as well, but the abstractions are slightly different Device-wide synchronization in SYCL on NVIDIA GPUs

huangapple
  • 本文由 发表于 2023年2月14日 01:40:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/75439410.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定