Spark会话值未更新

huangapple go评论81阅读模式
英文:

Spark session value not updating

问题

我正在使用以下代码设置Spark会话值:

  1. spark = (SparkSession
  2. .builder
  3. .appName('LoadDev1')
  4. #.config("spark.master","local[2]")
  5. .config("spark.master","yarn")
  6. .config("spark.yarn.queue","uldp")
  7. .config("spark.tez.queue","uldp")
  8. .config("spark.executor.instances","5")
  9. .enableHiveSupport()
  10. .getOrCreate()
  11. )
  12. return spark

但是,当我在程序中打印值时,例如 spark.executor.instances 的值为10,尽管我可以看到如果我更改配置文件,则 appname 会更改,这使我相信确实读取了该配置文件,但某种方式值被覆盖。

如果我使用 --conf 提供值,则会反映出来,但我想使用配置文件而不是 --conf

请帮助我解决这个问题。

英文:

I am setting spark session value using below code

  1. spark = (SparkSession
  2. .builder
  3. .appName('LoadDev1')
  4. #.config("spark.master","local[2]")
  5. .config("spark.master","yarn")
  6. .config("spark.yarn.queue","uldp")
  7. .config("spark.tez.queue","uldp")
  8. .config("spark.executor.instances","5")
  9. .enableHiveSupport()
  10. .getOrCreate()
  11. )
  12. return spark
  13. spark-submit --jars /app/spark3.3.1/jars/iceberg-spark-runtime-3.3_2.12-1.1.0.jar --conf spark.sql.shuffle.partitions=100 --conf spark.hive.vectorized.execution.enabled=false --py-files
  14. /home/path/SparkFactory_iceberg1.py

But when I print the values inside my program for example spark.executor.instances value = 10 although I can see that appname is changing if I am changing the config file which make me believe that this config file is indeed read but somehow values are overwritten.

If I provide the value using --conf it is reflected but I want to use config file rather than --conf.

Please help me with this.

答案1

得分: 1

spark-submit 是一个完全独立的应用程序,与您在顶部创建的会话不同,因此您需要将这些配置传递给 spark-submit 命令,并且您可以创建一个 properties 文件,它将覆盖位于 conf/spark-defaults.conf 的默认 Spark 配置,配置如下:

app.conf

  1. spark.master yarn
  2. spark.yarn.queue uldp
  3. spark.tez.queue uldp
  4. spark.executor.instances 5
  5. spark.sql.shuffle.partitions 100
  6. spark.hive.vectorized.execution.enabled false
  1. $ spark-submit \
  2. --properties-file <PATH>/app.conf \
  3. --jars /app/spark3.3.1/jars/iceberg-spark-runtime-3.3_2.12-1.1.0.jar \
  4. --py-files /home/path/SparkFactory_iceberg1.py \
  5. /home/path/main.py

(Note: I've removed the HTML entities in the --properties-file line for clarity, as they appear to be HTML-encoded characters. Please use the actual path accordingly.)

英文:

spark-submit is a completely separate application than the session you create at the top, so you need to pass those configs into the spark-submit command, and you can create a properties-file which will override the default spark config at conf/spark-defaults.conf with the configs like this:

app.conf

  1. spark.master yarn
  2. spark.yarn.queue uldp
  3. spark.tez.queue uldp
  4. spark.executor.instances 5
  5. spark.sql.shuffle.partitions 100
  6. spark.hive.vectorized.execution.enabled false
  1. $ spark-submit \
  2. --properties-file &lt;PATH&gt;/app.conf \
  3. --jars /app/spark3.3.1/jars/iceberg-spark-runtime-3.3_2.12-1.1.0.jar \
  4. --py-files /home/path/SparkFactory_iceberg1.py \
  5. /home/path/main.py

huangapple
  • 本文由 发表于 2023年6月19日 17:17:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76505242.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定