Spark: 应用程序可用的执行内存有多少?

huangapple go评论45阅读模式
英文:

Spark: How much executor memory is available for application use?

问题

我正在编写一个处理内存中数据“块”的算法。我正在使用JavaPairRDD.groupByKey()来指定这些块,但我不清楚如何计算最佳的块大小。块越大,算法运行得越快。根据块大小,我可以估算我的内存使用情况,但实际上有多少执行器内存可供我使用(而不是由Spark用于自身的内存)?是否有任何方法可以以编程方式提示Spark在转换链中存在内存密集型步骤?

英文:

I am writing an algorithm that processes a "chunk" of data in memory. I'm using JavaPairRDD.groupByKey() to designate the chunks, but it is unclear to me how to calculate the optimal chunk size. The large it is, the faster the algo will run. Given the chunk size, I can estimate my memory use, but how much executor memory is actually available to me (as opposed to, claimed by Spark for its own use)? And is there any way to programmatically suggest to Spark that I have a memory-intensive step in the transformation chain?

答案1

得分: 0

不用担心,这篇文章 解释得非常清楚。你会得到
(HeapSize – ReservedMemory) * (1.0 – spark.memory.fraction)
对于一个4GB的堆,假设其他参数采用默认设置,大约是1500MB。

英文:

Never mind, this post explains it really well. You get
(HeapSize – ReservedMemory) * (1.0 – spark.memory.fraction)
which for a 4GB heap is about 1500MB assuming default settings for other parameters.

huangapple
  • 本文由 发表于 2023年6月19日 20:35:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76506683.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定