执行内存在本地模式下运行PySpark时如何确定的?

huangapple go评论66阅读模式
英文:

How the executor memory is determined while running pyspark in local mode?

问题

如果我提交 Spark 程序如下:

->submit-spark --driver-memory 500M --executor-memory 300M spark_dataframe_example.py

然后分配的执行器内存是 "110 MiB",尽管我将其设置为 "300 M"。请参考下面的图片:

执行内存在本地模式下运行PySpark时如何确定的?

如果我提交 Spark 程序如下:

-> spark-submit --driver-memory 1G --executor-memory 200M --num-executors 2 spark_dataframe_example.py

然后分配的执行器内存是 "413.9 MiB",尽管我将其设置为 "200 M"。请参考下面的图片:

执行内存在本地模式下运行PySpark时如何确定的?

那么有人能确认执行器内存是如何分配的吗?

英文:

If I submit spark program as

->submit-spark --driver-memory 500M --executor-memory 300M spark_dataframe_example.py

then executor memory allocated is "110 MiB" even though I have set it as "300 M" .See image below:-

执行内存在本地模式下运行PySpark时如何确定的?
If I submit spark program as

-> spark-submit --driver-memory 1G --executor-memory 200M --num-executors 2 spark_dataframe_example.py

then executor memory allocated is "413.9 MiB" even though I have set it as "200 M".See image below:-

执行内存在本地模式下运行PySpark时如何确定的?

So could someone confirm how this executor memory is allocated?

答案1

得分: 2

如评论中所述,--executor-memory标志在本地模式下被忽略。要确认这一点,尝试使用一个非常高的--executor-memory标志来运行您的spark-submit命令(大于您的机器内存):它不会抱怨,因为它被忽略。

在本地模式下运行Spark是一个例外,因为您的驱动程序和执行程序在单个JVM内运行。因此,在这里重要的是您的--driver-memory标志。

那么这些110MB和413.9MB是从哪里来的?在本帖发布时,版本3.3.2 - 最新版本 - 它们是根据此计算得出的:

    val systemMemory = conf.get(TEST_MEMORY)
    val reservedMemory = conf.getLong(TEST_RESERVED_MEMORY.key,
      if (conf.contains(IS_TESTING)) 0 else RESERVED_SYSTEM_MEMORY_BYTES)

    // 一些不相关的行
    ...

    val usableMemory = systemMemory - reservedMemory
    val memoryFraction = conf.get(config.MEMORY_FRACTION)
    (usableMemory * memoryFraction).toLong

有3个值,我们需要找出它们来自何处:

  • systemMemory来自TEST_MEMORY,最终是您运行的JVM进程的Xmx值。这将始终略小于您的--driver-memory,但接近它。您无需自己设置此值,Spark会根据您的内存需求为您设置。
  • reservedMemory将由RESERVED_SYSTEM_MEMORY_BYTES定义(因为我们不处于测试场景中),这个值是314572800(300MB)。
  • memoryFraction在默认情况下为0.6,这似乎是您的情况。

因此,最终的计算是:(systemMemory - reservedMemory) * memoryFraction

现在我们可以进行计算!

您的第一个情况

--driver-memory为500M,所以让我们计算:

  • (524288000 - 314572800) * 0.6 = 125829120 = 120MB
  • 由于我们知道您的JVM进程的Xmx值接近但小于--driver-memory,因此这非常接近110MB!

您的第二个情况

--driver-memory为1G,所以让我们计算:

  • (1048576000 - 314572800) * 0.6 = 440401920 = 420MB
  • 由于我们知道您的JVM进程的Xmx值接近但小于--driver-memory,因此这非常接近413.9MB!
英文:

As was said in the comments, the --executor-memory flag is ignored in local mode. To confirm this, try running your spark-submit command with a ridiculously high --executor-memory flag (bigger than your machine): it won't complain because it is ignored.

Running Spark in local mode is a bit of an exception, since your driver and executor run inside of a single JVM. So the value that counts here is your --driver-memory flag.

Now, where do those 110MB and 413.9MB come from? In version 3.3.2 - the most recent version at the time of this post - they are the result of this calculation:

    val systemMemory = conf.get(TEST_MEMORY)
    val reservedMemory = conf.getLong(TEST_RESERVED_MEMORY.key,
      if (conf.contains(IS_TESTING)) 0 else RESERVED_SYSTEM_MEMORY_BYTES)

    // skipping some irrelevant lines
    ...

    val usableMemory = systemMemory - reservedMemory
    val memoryFraction = conf.get(config.MEMORY_FRACTION)
    (usableMemory * memoryFraction).toLong

There are 3 values which we need to find out where they come from:

  • systemMemory comes from TEST_MEMORY which finally is the Xmx value of your running JVM process. This will always be a bit smaller than your --driver-memory but close to it. You don't set this yourself, Spark does that for you in function of your memory requirements.
  • reservedMemory will be defined by RESERVED_SYSTEM_MEMORY_BYTES (since we're not in a testing scenario) and this value is 314572800 (300MB).
  • memoryFraction is 0.6 in the default scenario, which seems to be the case for you.

The final calculation is thus: `(systemMemory - reservedMemory)*memoryFraction

Now we can do our calculations!

Your first case

--driver-memory was 500M, so let's calculate:

  • (524288000 - 314572800) * 0.6 = 125829120 = 120MB
  • since we know the Xmx value of your JVM process is close to, but smaller than --driver-memory this is very close to 110MB!

Your second case

--driver-memory was 1G, so let's calculate:

  • (1048576000 - 314572800) * 0.6 = 440401920 = 420MB

  • since we know the Xmx value of your JVM process is close to, but smaller than --driver-memory this is very close to 413.9MB!

huangapple
  • 本文由 发表于 2023年3月3日 21:14:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/75627550.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定