英文:
(Apache Beam) Cannot increase executor memory - it is fixed at 1024M despite using multiple settings
问题
我正在 Spark 上运行 Apache Beam 的工作负载。我使用 32GB 的内存初始化了工作节点(从节点使用 -c 2 -m 32G
运行)。Spark 提交将驱动程序内存设置为 30g,执行器内存设置为 16g。然而,执行器失败并显示 java.lang.OutOfMemoryError: Java heap space
。
主节点界面显示每个执行器的内存为 1024M。另外,我看到所有的 Java 进程都是使用 -Xmx 1024m
启动的。这意味着 spark-submit 没有将它的执行器设置传播给执行器。
管道选项如下:
--runner PortableRunner \
--job_endpoint=localhost:8099 \
--environment_type=PROCESS \
--environment_config='{ "command": "$HOME/beam/sdks/python/container/build/target/launcher/linux_amd64/boot"}';
作业端点按照默认方式设置:
docker run --rm --network=host --name spark-jobservice apache/beam_spark_job_server:latest --spark-master-url=spark://$HOSTNAME:7077
我该如何确保这些设置传播到执行器?
更新:
我将 conf/spark-defaults.conf 设置为:
spark.driver.memory 32g
spark.executor.memory 32g
并将 conf/spark-env.sh 设置为:
SPARK_EXECUTOR_MEMORY=32g
然后重新启动集群并重新启动了所有内容,但执行器内存仍然限制为 1024M。
英文:
I am running an apache beam workload on Spark. I initialized the workers with 32GB of memory (slave run with -c 2 -m 32G
). Spark submit sets driver memory to 30g and executor memory to 16g. However, executors fail with java.lang.OutOfMemoryError: Java heap space
.
The master gui indicates that memory per executor is 1024M. In addition, I see that all java processes are launched with -Xmx 1024m
. This means spark-submit doesn't propagate it's executor settings to the executors.
Pipeline options are as follows:
--runner PortableRunner \
--job_endpoint=localhost:8099 \
--environment_type=PROCESS \
--environment_config='{"command": "$HOME/beam/sdks/python/container/build/target/launcher/linux_amd64/boot"}'
Job endpoint is setup in the default way:
docker run --rm --network=host --name spark-jobservice apache/beam_spark_job_server:latest --spark-master-url=spark://$HOSTNAME:7077
How do I make sure the settings propagate to the executors?
Update:
I set conf/spark-defaults.conf to
spark.driver.memory 32g
spark.executor.memory 32g
and conf/spark-env.sh to
SPARK_EXECUTOR_MEMORY=32g
and restarted the cluster and relaunched everything, and executor memory is still limited to 1024M
答案1
得分: 3
我找到了原因并且有一个解决方法。
作业服务器容器在内部运行它自己的 Spark 发行版,因此在您的本地机器上配置的 Spark 发行版设置不起作用。
解决方法是在作业服务器容器中更改配置,例如在启动时传递相应的环境变量:
docker run -e SPARK_EXECUTOR_MEMORY=32g --rm --network=host --name spark-jobservice apache/beam_spark_job_server:latest --spark-master-url=spark://$HOSTNAME:7077
英文:
I found the reason and a workaround.
The jobserver container is running internally its own spark distribution, so the settings configured in the spark distribution on your local machine have no effect.
The solution is thus to change the configuration in the jobserver container, for instance by passing the corresponding environment variable when launching it:
docker run -e SPARK_EXECUTOR_MEMORY=32g --rm --network=host --name spark-jobservice apache/beam_spark_job_server:latest --spark-master-url=spark://$HOSTNAME:7077
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论