英文:
Spark Java: Cannot change driver memory
问题
我有一个包含16个工作节点和一个主节点的Spark独立集群。我从主节点的spark_home/conf文件夹中使用"sh start-all.sh"命令启动集群。主节点有32GB内存和14个VCPUS,而每个节点有16GB内存和8个VCPUS。我还有一个Spring应用程序,当它启动(使用java -jar app.jar命令)时,会初始化Spark上下文。spark-env.sh文件如下:
export SPARK_MASTER_HOST='192.168.100.17'
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=14000mb
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_OPTS='-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=172800 -Dspark.worker.cleanup.appDataTtl=172800';
我在spark-defaults.conf中没有任何内容,程序化初始化Spark上下文的代码如下:
@Bean
public SparkSession sparksession() {
SparkSession sp = SparkSession
.builder()
.master("spark://....")
.config("spark.cassandra.connection.host","192.168.100......")
.appName("biomet")
.config("spark.driver.memory","20g")
.config("spark.driver.maxResultSize", "10g")
.config("spark.sql.shuffle.partitions",48)
.config("spark.executor.memory","7g")
.config("spark.sql.pivotMaxValues","50000")
.config("spark.sql.caseSensitive",true)
.config("spark.executor.extraClassPath","/home/ubuntu/spark-2.4.3-bin-hadoop2.7/jars/guava-16.0.1.jar")
.config("spark.hadoop.fs.s3a.access.key","...")
.config("spark.hadoop.fs.s3a.secret.key","...")
.getOrCreate();
return sp;
}
在所有这些设置之后,Spark UI的"Environment"选项卡中显示spark.driver.maxResultSize为10g,spark.driver.memory为20g,但是"Executors"选项卡中的驱动程序存储内存显示为0.0 B / 4.3 GB。
(FYI:我曾经将spark.driver.memory设置为10g(以编程方式设置),在"Executors"选项卡中显示为4.3GB,但现在似乎无法更改它。但我在思考,即使我将其设置为10g,它是否应该给我超过4.3GB?!)
如何更改驱动程序内存?我尝试从spark-defaults.conf中设置它,但没有任何变化。即使我根本不设置驱动程序内存(或将其设置为小于4.3GB),"Executors"选项卡中仍然显示为4.3GB。
英文:
So, I have a spark standalone cluster with 16 worker nodes and one master node. I start the cluster with "sh start-all.sh" command from the master node in spark_home/conf folder. The master node has 32Gb Ram and 14 VCPUS, while I have 16Gb Ram and 8 VCPUS per node. I also have a spring application which, when it starts(with java -jar app.jar), it initializes the spark context. The spark-env.sh file is:
export SPARK_MASTER_HOST='192.168.100.17'
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=14000mb
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_OPTS='-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=172800 -Dspark.worker.cleanup.appDataTtl=172800'
I do not have anything in spark-defaults.conf and the code for initializing the spark context programmatically is:
@Bean
public SparkSession sparksession() {
SparkSession sp = SparkSession
.builder()
.master("spark://....")
.config("spark.cassandra.connection.host","192.168.100......")
.appName("biomet")
.config("spark.driver.memory","20g")
.config("spark.driver.maxResultSize", "10g")
.config("spark.sql.shuffle.partitions",48)
.config("spark.executor.memory","7g")
.config("spark.sql.pivotMaxValues","50000")
.config("spark.sql.caseSensitive",true)
.config("spark.executor.extraClassPath","/home/ubuntu/spark-2.4.3-bin-hadoop2.7/jars/guava-16.0.1.jar")
.config("spark.hadoop.fs.s3a.access.key","...")
.config("spark.hadoop.fs.s3a.secret.key","...")
.getOrCreate();
return sp;
}
After all this the Environment tab of the Spark UI has spark.driver.maxResultSize 10g and spark.driver.memory 20g BUT the executors tab for the storage memory of the driver says 0.0 B / 4.3 GB.
(FYI: I used to have spark.driver.memory at 10g(programmatically set), and in the executor tab was saying 4.3Gb, but now it seems I cannot change it. But I am thinking that even if when I had it 10g, wasn't it suppose to give me more than 4.3Gb?!)
How can I change the driver memory? I tried setting it from spark-defaults.conf but nothing changed. Even if I do not set at all the driver memory(or set it to smaller than 4.3Gb) it still says 4.3Gb in executors tab.
答案1
得分: 0
我怀疑你正在客户端模式下运行你的应用程序,然后根据文档所述:
最大堆大小设置可以在集群模式下通过spark.driver.memory来设置,在客户端模式下通过--driver-memory命令行选项来设置。注意:在客户端模式下,不能直接通过你的应用程序中的SparkConf来设置这个配置,因为驱动程序JVM在那个时候已经启动了。
在当前情况下,Spark作业是从应用程序中提交的,因此应用程序本身是一个驱动程序,并且其内存通常是通过-Xmx等方式进行调整的。
英文:
I suspect that you're running your application in the client mode, then per documentation:
> Maximum heap size settings can be set with spark. driver. memory in the cluster mode and through the --driver-memory command line option in the client mode. Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point.
In current case, the Spark job is submitted from the application, so the application itself is a driver, and its memory is regulated as usual for Java applications - via -Xmx
, etc.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论