多个Spark执行器在单个GPU上

huangapple go评论51阅读模式
英文:

Multiple Spark Executors on single GPU

问题

我们正在尝试通过在节点上引入GPU来提高Spark作业的处理性能。但是,在启用了带有GPU的Spark 3之后,我们发现了Spark作业性能的下降,这是因为启用GPU后只能创建有限数量的Spark执行器。

例如,
只有CPU核心(没有GPU)时,我们能够创建数百个执行器,因为我们有数百个CPU核心。

启用GPU后,我们只能创建6个执行器,因为我们只有6个GPU硬件。

那么,有没有办法在单个GPU上运行多个执行器?

英文:

We are trying to improve the Spark Job processing performance by introducing GPUs to the nodes. But after enabling Spark3 with GPUs we are seeing downtrend in spark job performance, due to limited number of spark executors creation with GPU enabled.

i.e
with just CPU cores(without GPU) we are able to create hundreds of executors as we have got hundreds of CPU cores.

with GPU enabled, we are able to create only 6 executors as we have got only 6 GPU hardware.

So, is there anyway to run multiple executors with single GPU

答案1

得分: 1

如果您正在使用Spark资源调度来分配执行器到GPU,我不认为在Spark调度中有一种方法可以将多个执行器分配到同一GPU上。资源数量配置是一个整数,因此无法指定将GPU的分数部分分配给每个执行器。

如果您绕过Spark的GPU调度功能,通过其他机制将执行器分配到GPU上,可能有一种方法让执行器共享GPU。但这取决于执行器中使用GPU的软件以及该软件是否可以配置为不假设它可以使用整个GPU。它可能需要人为地使用较少的GPU内存以为其他执行器腾出空间,这可能会导致软件性能不佳或遇到内存不足错误。共享GPU的进程之间存在上下文切换开销,可能会影响性能,相对于每个进程独占自己的GPU。

英文:

If you are using Spark resource scheduling to assign executors to GPUs, I do not believe there is a way to assign multiple executors to the same GPU in Spark scheduling. The resource amount config is an integer, so there isn't a way to specify a fractional amount of a GPU to assign to each executor.

If you are bypassing Spark's GPU scheduling feature and are assigning executors to GPUs via some other mechanism, there might be a way to have executors share the GPU. However this depends on the software within the executor that is using the GPU and whether that software can be configured to not assume it can use the entire GPU. It may need to artificially use less GPU memory to make room for other executors, and this may cause the software to perform suboptimally or encounter out-of-memory errors. There is also process context switching overhead between processes that share a GPU which can impact performance relative to each process exclusively using its own GPU.

huangapple
  • 本文由 发表于 2023年7月17日 16:20:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76702634.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定