英文:
Spark: How much executor memory is available for application use?
问题
我正在编写一个处理内存中数据“块”的算法。我正在使用JavaPairRDD.groupByKey()来指定这些块,但我不清楚如何计算最佳的块大小。块越大,算法运行得越快。根据块大小,我可以估算我的内存使用情况,但实际上有多少执行器内存可供我使用(而不是由Spark用于自身的内存)?是否有任何方法可以以编程方式提示Spark在转换链中存在内存密集型步骤?
英文:
I am writing an algorithm that processes a "chunk" of data in memory. I'm using JavaPairRDD.groupByKey() to designate the chunks, but it is unclear to me how to calculate the optimal chunk size. The large it is, the faster the algo will run. Given the chunk size, I can estimate my memory use, but how much executor memory is actually available to me (as opposed to, claimed by Spark for its own use)? And is there any way to programmatically suggest to Spark that I have a memory-intensive step in the transformation chain?
答案1
得分: 0
不用担心,这篇文章 解释得非常清楚。你会得到
(HeapSize – ReservedMemory) * (1.0 – spark.memory.fraction)
对于一个4GB的堆,假设其他参数采用默认设置,大约是1500MB。
英文:
Never mind, this post explains it really well. You get
(HeapSize – ReservedMemory) * (1.0 – spark.memory.fraction)
which for a 4GB heap is about 1500MB assuming default settings for other parameters.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论