英文:
spark number of executors when dynamic allocation is enabled
问题
I have a r5.8xlarge AWS cluster with 12 nodes, so there are 6144 cores (12nodes * 32vCPU * 16cores), I have set --executor-cores=5 and enabled the dynamic execution using the below spark-submit command, even after setting the spark.dynamicAllocation.initialExecutors=150 --conf spark.dynamicAllocation.minExecutors=150
, I'm only seeing 70 executors in the spark-UI application, what am I doing wrong?
我有一个由12个节点组成的r5.8xlarge AWS集群,所以有6144个核心(12个节点 * 32个vCPU * 16个核心),我已经设置了--executor-cores=5,并使用下面的spark-submit命令启用了动态执行,即使设置了spark.dynamicAllocation.initialExecutors=150 --conf spark.dynamicAllocation.minExecutors=150
,但在spark-UI应用程序中只看到70个执行器,我做错了什么?
r5.8xlarge clusters have 256GB per node, so 3072GB(256GB*12nodes)
r5.8xlarge集群每个节点有256GB内存,所以总共有3072GB(256GB*12个节点)
FYI -I'm not including the driver node in this calculation.
FYI - 我没有将驱动节点包括在这个计算中。
--driver-memory 200G --deploy-mode client --executor-memory 37G --executor-cores 7 --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.driver.maxResultSize=0 --conf spark.sql.shuffle.partitions=2000 --conf spark.dynamicAllocation.initialExecutors=150 --conf spark.dynamicAllocation.minExecutors=150
英文:
I have a r5.8xlarge AWS cluster with 12 nodes, so there are 6144 cores (12nodes * 32vCPU * 16cores), I have set --executor-cores=5 and enabled the dynamic execution using the below spark-submit command, even after setting the spark.dynamicAllocation.initialExecutors=150 --conf spark.dynamicAllocation.minExecutors=150
, I'm only seeing 70 executors in the spark-UI application, what am I doing wrong?
r5.8xlarge clusters have 256GB per node, so 3072GB(256GB*12nodes)
FYI -I'm not including the driver node in this calculation.
--driver-memory 200G --deploy-mode client --executor-memory 37G --executor-cores 7 --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.driver.maxResultSize=0 --conf spark.sql.shuffle.partitions=2000 --conf spark.dynamicAllocation.initialExecutors=150 --conf spark.dynamicAllocation.minExecutors=150
答案1
得分: 1
您每个节点有256GB内存,每个执行器有37GB内存,一个执行器只能在一个节点上运行(一个执行器不能在多个节点之间共享),因此对于每个节点,您最多可以有6个执行器(256 / 37 = 6),因为您有12个节点,所以最多的执行器数量将是6 * 12 = 72个执行器,这解释了为什么您在Spark UI中只看到了70个执行器(2个执行器的差异可能是由于分配给驱动程序的内存或某些节点中的内存分配问题造成的)。
如果您想要更多的执行器,那么您必须减少执行器的内存,还要确保完全利用您的集群,确保节点内存除以执行器内存的余数尽可能接近零,例如:
-
每个节点256GB内存,每个执行器37GB内存:256 / 37 = 6.9 => 每个节点6个执行器(每个节点损失34GB内存)
-
每个节点256GB内存,每个执行器36GB内存:256 / 36 = 7.1 => 每个节点7个执行器(每个节点只损失4GB内存,因此您获得了每个节点30GB未使用的内存)
如果您想要至少150个执行器,那么执行器的内存应该最多为19GB。
英文:
You have 256GB per node and 37G per executor, an executor can only be in one node (a executor cannot be shared between multiple nodes), so for each node you will have at most 6 executors (256 / 37 = 6), since you have 12 nodes so the max number of executors will be 6 * 12 = 72 executor which explain why you see only 70 executor in your spark ui (the difference of 2 executor's is caused by the memory allocated to the driver or maybe because of some memory allocation problem in some nodes).
If you want more executors then you have to decrease the memory of the executors, also to fully utilize your cluster make sure that the reminder of the the node memory divided by the executor memory is as close to zero as possible, ex:
-
256GB per node and 37G per executor: 256 / 37 = 6.9 => 6 executor per node (34G lost per node)
-
256GB per node and 36G per executor: 256 / 36 = 7.1 => 7 executor per node ( only 4G lost per node, so you gain 30G of unused memory per node)
If you want at least 150 executor then executor memory should be at most 19G
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论