英文:
Spark session value not updating
问题
我正在使用以下代码设置Spark会话值:
spark = (SparkSession
.builder
.appName('LoadDev1')
#.config("spark.master","local[2]")
.config("spark.master","yarn")
.config("spark.yarn.queue","uldp")
.config("spark.tez.queue","uldp")
.config("spark.executor.instances","5")
.enableHiveSupport()
.getOrCreate()
)
return spark
但是,当我在程序中打印值时,例如 spark.executor.instances
的值为10,尽管我可以看到如果我更改配置文件,则 appname
会更改,这使我相信确实读取了该配置文件,但某种方式值被覆盖。
如果我使用 --conf
提供值,则会反映出来,但我想使用配置文件而不是 --conf
。
请帮助我解决这个问题。
英文:
I am setting spark session value using below code
spark = (SparkSession
.builder
.appName('LoadDev1')
#.config("spark.master","local[2]")
.config("spark.master","yarn")
.config("spark.yarn.queue","uldp")
.config("spark.tez.queue","uldp")
.config("spark.executor.instances","5")
.enableHiveSupport()
.getOrCreate()
)
return spark
spark-submit --jars /app/spark3.3.1/jars/iceberg-spark-runtime-3.3_2.12-1.1.0.jar --conf spark.sql.shuffle.partitions=100 --conf spark.hive.vectorized.execution.enabled=false --py-files
/home/path/SparkFactory_iceberg1.py
But when I print the values inside my program for example spark.executor.instances value = 10 although I can see that appname is changing if I am changing the config file which make me believe that this config file is indeed read but somehow values are overwritten.
If I provide the value using --conf it is reflected but I want to use config file rather than --conf.
Please help me with this.
答案1
得分: 1
spark-submit
是一个完全独立的应用程序,与您在顶部创建的会话不同,因此您需要将这些配置传递给 spark-submit
命令,并且您可以创建一个 properties 文件,它将覆盖位于 conf/spark-defaults.conf
的默认 Spark 配置,配置如下:
app.conf
spark.master yarn
spark.yarn.queue uldp
spark.tez.queue uldp
spark.executor.instances 5
spark.sql.shuffle.partitions 100
spark.hive.vectorized.execution.enabled false
$ spark-submit \
--properties-file <PATH>/app.conf \
--jars /app/spark3.3.1/jars/iceberg-spark-runtime-3.3_2.12-1.1.0.jar \
--py-files /home/path/SparkFactory_iceberg1.py \
/home/path/main.py
(Note: I've removed the HTML entities in the --properties-file
line for clarity, as they appear to be HTML-encoded characters. Please use the actual path accordingly.)
英文:
spark-submit
is a completely separate application than the session you create at the top, so you need to pass those configs into the spark-submit
command, and you can create a properties-file which will override the default spark config at conf/spark-defaults.conf
with the configs like this:
app.conf
spark.master yarn
spark.yarn.queue uldp
spark.tez.queue uldp
spark.executor.instances 5
spark.sql.shuffle.partitions 100
spark.hive.vectorized.execution.enabled false
$ spark-submit \
--properties-file <PATH>/app.conf \
--jars /app/spark3.3.1/jars/iceberg-spark-runtime-3.3_2.12-1.1.0.jar \
--py-files /home/path/SparkFactory_iceberg1.py \
/home/path/main.py
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论