2023年3月15日 18:14:17go评论110阅读模式

英文:

Azure Databricks: How to add Spark configuration in Databricks workspace level?

问题

我想在工作的Databricks工作区中添加一些Spark配置，以便将其复制到工作区中的所有集群。

对于这个问题，一个示例的全局初始化脚本会很有帮助。

英文:

I want to add some spark configuration at work Databricks workspace so it gets replicated to all the clusters in the workspace.

A sample global init script for the same will be helpful.

答案1

得分: 0

你可以在不同级别设置Spark配置。
步骤1:
尝试使用集群级配置。

创建一个示例的全局初始化脚本，将spark.sql.shuffle.partitions配置设置为100。
打开记事本，创建一个名为set-spark-config.sh的新文件。
在记事本中使用以下代码，并将其保存为set-spark-config.sh。

代码：

#!/usr/bin/env bash

echo "Setting Spark configuration..."
echo "spark.sql.shuffle.partitions 100" >> /databricks/spark/conf/spark-defaults.conf

将set-spark-config.sh上传到您的DBFS。

在Databricks中，导航到管理控制台 / 全局初始化脚本 / 添加脚本。

为脚本命名，例如设置配置。
提供路径，如**/FileStore/tables/set-spark-config.sh**。
请参考截图。

Azure Databricks：如何在 Databricks 工作区级别添加 Spark 配置？

创建完初始化脚本后，它将在工作区中的所有集群上执行。所有运行在这些集群上的Spark作业的spark.sql.shuffle.partitions配置将被设置为100。

**注意：**全局初始化脚本在启动时执行，因此对配置的任何更改将不会生效，直到重新启动集群。

步骤2:
在Databricks中，导航到管理控制台 / 全局初始化脚本 / 添加脚本。
为脚本命名，例如设置配置01。
在脚本区域中尝试以下内容：

spark.sql.execution.arrow.pyspark.enabled true

保存并启用脚本。

**注意：**这将将配置应用于工作区中的所有集群和笔记本。

英文:

You can set Spark configurations at different levels.
Step 1:
Try with the Cluster level Configuration.

Create sample global init script that sets the spark.sql.shuffle.partitions configuration to 100.
Open a notepad and create a new file named set-spark-config.sh
Use the blow code in the Note pad and save it as set-spark-config.sh

Code:

**#!/usr/bin/env bash**

**echo &quot;Setting Spark configuration...&quot;**

**echo &quot;spark.sql.shuffle.partitions 100&quot; &gt;&gt; /databricks/spark/conf/spark-defaults.conf**

Upload the set-spark-config.sh to your DBFS

In the Databricks navigate to the Admin Console / Global Init Scripts / ADD Script

Name the Script like for example Set Configuration
provide the path like this /FileStore/tables/set-spark-config.sh
Please refer to the screenshot.

Azure Databricks：如何在 Databricks 工作区级别添加 Spark 配置？

Once you have created the init script, it will be executed on all clusters in the workspace. The spark.sql.shuffle.partitions configuration will be set to 100 for all Spark jobs running on these clusters.

Note: that global init scripts are executed at startup time, so any changes to the configuration will not take effect until the clusters are restarted.

Step 2:
In the Databricks navigate to the Admin Console / Global Init Scripts / ADD Script
Name the Script like for example Set Configuration01
In the Script area try for this

spark.sql.execution.arrow.pyspark.enabled true Azure Databricks：如何在 Databricks 工作区级别添加 Spark 配置？

Save and Enable the Script.

Note: This applies the configuration to all clusters and notebooks in the workspace.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Azure Databricks：如何在 Databricks 工作区级别添加 Spark 配置？

问题

答案1

Azure Databricks REST API在通过ReactJs web应用程序进行请求时受到CORS策略的阻止。

Databricks – 无法创建表，关联的位置不为空且不是 Delta 表

(Forbidden) The user, group or application 'appid="**" does not have secrets get permission on key vault 'Key;location=eastus'

Spark驱动程序意外停止（Databricks）

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论