2020年8月6日 20:42:42go评论96阅读模式

英文:

Spark Java: Cannot change driver memory

问题

我有一个包含16个工作节点和一个主节点的Spark独立集群。我从主节点的spark_home/conf文件夹中使用"sh start-all.sh"命令启动集群。主节点有32GB内存和14个VCPUS，而每个节点有16GB内存和8个VCPUS。我还有一个Spring应用程序，当它启动（使用java -jar app.jar命令）时，会初始化Spark上下文。spark-env.sh文件如下：

export SPARK_MASTER_HOST='192.168.100.17'
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=14000mb
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_OPTS='-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=172800 -Dspark.worker.cleanup.appDataTtl=172800';

我在spark-defaults.conf中没有任何内容，程序化初始化Spark上下文的代码如下：

@Bean
public SparkSession sparksession() {
     SparkSession sp = SparkSession
             .builder()
             .master("spark://....")
             .config("spark.cassandra.connection.host","192.168.100......")
             .appName("biomet")
             .config("spark.driver.memory","20g")
             .config("spark.driver.maxResultSize", "10g")
             .config("spark.sql.shuffle.partitions",48)
             .config("spark.executor.memory","7g")
             .config("spark.sql.pivotMaxValues","50000")
             .config("spark.sql.caseSensitive",true)
             .config("spark.executor.extraClassPath","/home/ubuntu/spark-2.4.3-bin-hadoop2.7/jars/guava-16.0.1.jar")
             .config("spark.hadoop.fs.s3a.access.key","...")
             .config("spark.hadoop.fs.s3a.secret.key","...")
             .getOrCreate();
     return sp;
 }

在所有这些设置之后，Spark UI的"Environment"选项卡中显示spark.driver.maxResultSize为10g，spark.driver.memory为20g，但是"Executors"选项卡中的驱动程序存储内存显示为0.0 B / 4.3 GB。

（FYI：我曾经将spark.driver.memory设置为10g（以编程方式设置），在"Executors"选项卡中显示为4.3GB，但现在似乎无法更改它。但我在思考，即使我将其设置为10g，它是否应该给我超过4.3GB？！）

如何更改驱动程序内存？我尝试从spark-defaults.conf中设置它，但没有任何变化。即使我根本不设置驱动程序内存（或将其设置为小于4.3GB），"Executors"选项卡中仍然显示为4.3GB。

英文:

So, I have a spark standalone cluster with 16 worker nodes and one master node. I start the cluster with "sh start-all.sh" command from the master node in spark_home/conf folder. The master node has 32Gb Ram and 14 VCPUS, while I have 16Gb Ram and 8 VCPUS per node. I also have a spring application which, when it starts(with java -jar app.jar), it initializes the spark context. The spark-env.sh file is:

export SPARK_MASTER_HOST=&#39;192.168.100.17&#39;
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=14000mb 
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_OPTS=&#39;-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=172800 -Dspark.worker.cleanup.appDataTtl=172800&#39;

I do not have anything in spark-defaults.conf and the code for initializing the spark context programmatically is:

@Bean
public SparkSession sparksession() {
     SparkSession sp = SparkSession
             .builder()
    .master(&quot;spark://....&quot;)
    .config(&quot;spark.cassandra.connection.host&quot;,&quot;192.168.100......&quot;)
    .appName(&quot;biomet&quot;)
    .config(&quot;spark.driver.memory&quot;,&quot;20g&quot;)
    .config(&quot;spark.driver.maxResultSize&quot;, &quot;10g&quot;)
    .config(&quot;spark.sql.shuffle.partitions&quot;,48) 
    .config(&quot;spark.executor.memory&quot;,&quot;7g&quot;) 
    .config(&quot;spark.sql.pivotMaxValues&quot;,&quot;50000&quot;) 
    .config(&quot;spark.sql.caseSensitive&quot;,true)
    .config(&quot;spark.executor.extraClassPath&quot;,&quot;/home/ubuntu/spark-2.4.3-bin-hadoop2.7/jars/guava-16.0.1.jar&quot;)
    .config(&quot;spark.hadoop.fs.s3a.access.key&quot;,&quot;...&quot;)
    .config(&quot;spark.hadoop.fs.s3a.secret.key&quot;,&quot;...&quot;)
             .getOrCreate();
     return sp;
 }

After all this the Environment tab of the Spark UI has spark.driver.maxResultSize 10g and spark.driver.memory 20g BUT the executors tab for the storage memory of the driver says 0.0 B / 4.3 GB.

(FYI: I used to have spark.driver.memory at 10g(programmatically set), and in the executor tab was saying 4.3Gb, but now it seems I cannot change it. But I am thinking that even if when I had it 10g, wasn't it suppose to give me more than 4.3Gb?!)

How can I change the driver memory? I tried setting it from spark-defaults.conf but nothing changed. Even if I do not set at all the driver memory(or set it to smaller than 4.3Gb) it still says 4.3Gb in executors tab.

答案1

得分: 0

我怀疑你正在客户端模式下运行你的应用程序，然后根据文档所述：

最大堆大小设置可以在集群模式下通过spark.driver.memory来设置，在客户端模式下通过--driver-memory命令行选项来设置。注意：在客户端模式下，不能直接通过你的应用程序中的SparkConf来设置这个配置，因为驱动程序JVM在那个时候已经启动了。

在当前情况下，Spark作业是从应用程序中提交的，因此应用程序本身是一个驱动程序，并且其内存通常是通过-Xmx等方式进行调整的。

英文:

I suspect that you're running your application in the client mode, then per documentation:

> Maximum heap size settings can be set with spark. driver. memory in the cluster mode and through the --driver-memory command line option in the client mode. Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point.

In current case, the Spark job is submitted from the application, so the application itself is a driver, and its memory is regulated as usual for Java applications - via -Xmx, etc.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Spark Java: 无法更改驱动程序内存

问题

答案1

Eclipse – 调试为 – TestNG 测试：无法找到测试源代码

接口在Java内部扩展Object类吗？

Java构造函数继承。构造函数未定义。

Java中的扩展私有字段

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。