当启用动态分配时,Spark的执行者数量

huangapple go评论56阅读模式
英文:

spark number of executors when dynamic allocation is enabled

问题

I have a r5.8xlarge AWS cluster with 12 nodes, so there are 6144 cores (12nodes * 32vCPU * 16cores), I have set --executor-cores=5 and enabled the dynamic execution using the below spark-submit command, even after setting the spark.dynamicAllocation.initialExecutors=150 --conf spark.dynamicAllocation.minExecutors=150, I'm only seeing 70 executors in the spark-UI application, what am I doing wrong?

我有一个由12个节点组成的r5.8xlarge AWS集群,所以有6144个核心(12个节点 * 32个vCPU * 16个核心),我已经设置了--executor-cores=5,并使用下面的spark-submit命令启用了动态执行,即使设置了spark.dynamicAllocation.initialExecutors=150 --conf spark.dynamicAllocation.minExecutors=150,但在spark-UI应用程序中只看到70个执行器,我做错了什么?

r5.8xlarge clusters have 256GB per node, so 3072GB(256GB*12nodes)

r5.8xlarge集群每个节点有256GB内存,所以总共有3072GB(256GB*12个节点)

FYI -I'm not including the driver node in this calculation.

FYI - 我没有将驱动节点包括在这个计算中。

--driver-memory 200G --deploy-mode client --executor-memory 37G --executor-cores 7 --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.driver.maxResultSize=0 --conf spark.sql.shuffle.partitions=2000 --conf spark.dynamicAllocation.initialExecutors=150 --conf spark.dynamicAllocation.minExecutors=150

英文:

I have a r5.8xlarge AWS cluster with 12 nodes, so there are 6144 cores (12nodes * 32vCPU * 16cores), I have set --executor-cores=5 and enabled the dynamic execution using the below spark-submit command, even after setting the spark.dynamicAllocation.initialExecutors=150 --conf spark.dynamicAllocation.minExecutors=150, I'm only seeing 70 executors in the spark-UI application, what am I doing wrong?

r5.8xlarge clusters have 256GB per node, so 3072GB(256GB*12nodes)

FYI -I'm not including the driver node in this calculation.

--driver-memory 200G --deploy-mode client --executor-memory 37G --executor-cores 7 --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.driver.maxResultSize=0 --conf spark.sql.shuffle.partitions=2000 --conf spark.dynamicAllocation.initialExecutors=150  --conf spark.dynamicAllocation.minExecutors=150 

答案1

得分: 1

您每个节点有256GB内存,每个执行器有37GB内存,一个执行器只能在一个节点上运行(一个执行器不能在多个节点之间共享),因此对于每个节点,您最多可以有6个执行器(256 / 37 = 6),因为您有12个节点,所以最多的执行器数量将是6 * 12 = 72个执行器,这解释了为什么您在Spark UI中只看到了70个执行器(2个执行器的差异可能是由于分配给驱动程序的内存或某些节点中的内存分配问题造成的)。

如果您想要更多的执行器,那么您必须减少执行器的内存,还要确保完全利用您的集群,确保节点内存除以执行器内存的余数尽可能接近零,例如:

  • 每个节点256GB内存,每个执行器37GB内存:256 / 37 = 6.9 => 每个节点6个执行器(每个节点损失34GB内存)

  • 每个节点256GB内存,每个执行器36GB内存:256 / 36 = 7.1 => 每个节点7个执行器(每个节点只损失4GB内存,因此您获得了每个节点30GB未使用的内存)

如果您想要至少150个执行器,那么执行器的内存应该最多为19GB。

英文:

You have 256GB per node and 37G per executor, an executor can only be in one node (a executor cannot be shared between multiple nodes), so for each node you will have at most 6 executors (256 / 37 = 6), since you have 12 nodes so the max number of executors will be 6 * 12 = 72 executor which explain why you see only 70 executor in your spark ui (the difference of 2 executor's is caused by the memory allocated to the driver or maybe because of some memory allocation problem in some nodes).

If you want more executors then you have to decrease the memory of the executors, also to fully utilize your cluster make sure that the reminder of the the node memory divided by the executor memory is as close to zero as possible, ex:

  • 256GB per node and 37G per executor: 256 / 37 = 6.9 => 6 executor per node (34G lost per node)

  • 256GB per node and 36G per executor: 256 / 36 = 7.1 => 7 executor per node ( only 4G lost per node, so you gain 30G of unused memory per node)

If you want at least 150 executor then executor memory should be at most 19G

huangapple
  • 本文由 发表于 2023年2月10日 11:32:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75406660.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定