英文:
Externalize Spark Configurations
问题
我需要将我们的job.conf文件中的Spark配置外部化,以便它们可以从外部位置读取,并仅在运行时在该外部位置进行修改。
配置项,如
spark.executor.memory
spark.executor.cores
spark.executor.instances
spark.sql.adaptive.enabled
spark.sql.legacy.timeParserPolicy
将存储在这个文件中。
我对此非常陌生,在网上找到的关于处理此过程的资源非常有限。我看过一些关于使用Scala文件处理这个问题的YouTube视频。任何帮助将不胜感激。
我已经尝试模仿我在网上看到的Scala示例,但不知道如何从Spark中调用生成的文件(甚至不确定Scala是否正确起步)。
英文:
I need to externalize the Spark Configs in our job.conf files so that they can be read from an external location and modified only in that one external location to use at runtime.
Configs such as
spark.executor.memory
spark.executor.cores
spark.executor.instances
spark.sql.adaptive.enabled
spark.sql.legacy.timeParserPolicy
Would be stored in this file.
I am very new to this and am finding very limited resources on the web about handling this process. I've seen a couple of YouTubes about using a scala file to handle this. Any assistance would be greatly appreciated.
I have attempted to emulate the scala examples I have seen online, but don't know how to call the resulting file from Spark (or even if the scala is correct to begin with).
答案1
得分: 1
- 你可以将配置放在
$SPARK_HOME/conf/spark-defaults.conf
中。 - 或者,如果你明确使用
spark-submit
或其他方式提交作业,也可以使用--conf
参数在命令行中传递它们。
英文:
TL;DR:
- you can put your config in
$SPARK_HOME/conf/spark-defaults.conf
- or if you're submitting your jobs explicitly using
spark-submit
or something then you can also pass them on command line using--conf
.
Spark configuration docs leave a bit to be desired.
As described in Dynamically Loading Spark Properties section:
> bin/spark-submit will also read configuration options from conf/spark-defaults.conf
, in which each line consists of a key and a value separated by whitespace. For example:
>
>
> spark.master spark://5.6.7.8:7077
> spark.executor.memory 4g
> spark.eventLog.enabled true
> spark.serializer org.apache.spark.serializer.KryoSerializer
>
Official documentation doesn't explicitly mention the location except in passing in this para related to hadoop config.
Some IBM doc has it more explicitly.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论