英文:
Create new databricks cluster from ADF linked service with InitScripts from abfss (azure blob )
问题
Recently, Databricks deprecated the DBFS init script, and I attempted to configure a Databricks linked service with InitScripts from ABFSS in ADF. However, I encountered a "file not found" error.
The new cluster configuration is as follows:
I was able to achieve the desired result in Databricks because there is an option to configure the init script location type:
However, in ADF, I couldn't find a similar option:
Please assist me in resolving this issue. I need to create a new Databricks cluster with an init script read from Azure Blob Storage (ABFSS) for every pipeline execution.
英文:
Recently databricks depreciated DBFS init script, And I tried to set up databricks linked service with InitScripts from abfss in ADF, I am getting a file not found error.
The new cluster configuration is like below
But same tried from Databricks its worked, because we have option to configure which is init script location type.
But in ADF , I couldn't see any option to do the same ,
Please help me on this to resolve the issue.
I need to create new databricks cluster with init script read from azure blob storage(abfss) for every pipeline excecution.
答案1
得分: 1
-
你可以使用 REST 调用创建集群,根据需要使用 abfss 进行初始化脚本,然后直接在 Databricks 笔记本中使用此集群。
-
你可以使用 "we" 活动来调用 "Clusters 2.0" REST API 来创建一个集群,如 此文档 中所指定的,需要在身份验证头部指定 bearer token(访问令牌)。以下是你可以使用的请求体(你可能还需要添加集群配置):
{
"num_workers": null,
"autoscale": {
"min_workers": 2,
"max_workers": 8
},
"cluster_name": "cluster1",
"spark_version": "7.3.x-scala2.12",
"spark_conf": {},
"node_type_id": "Standard_D3_v2",
"custom_tags": {},
"spark_env_vars": {
"PYSPARK_PYTHON": "/databricks/python3/bin/python3"
},
"autotermination_minutes": 120,
"init_scripts": "<script_path_here_as_specified_in_document>"
}
-
使用等待活动直到集群创建完成。我已经将等待时间设置为 300 秒。
-
最后,使用 web 活动返回的 cluster_id 来运行新创建的集群中的笔记本,如
@activity('Web1').output.cluster_id
。
英文:
- You can use the rest call to create the cluster with init script as required with abfss and then use this cluster in the databricks notebook directly.
- You can use the we activity to call the
Clusters 2.0
REST API to create a cluster as specified in this document with authentication header where you specify bearer token (access token). The following is the body that you can use (You might have to add clusted configuration as well):
{
"num_workers": null,
"autoscale": {
"min_workers": 2,
"max_workers": 8
},
"cluster_name": "cluster1",
"spark_version": "7.3.x-scala2.12",
"spark_conf": {},
"node_type_id": "Standard_D3_v2",
"custom_tags": {},
"spark_env_vars": {
"PYSPARK_PYTHON": "/databricks/python3/bin/python3"
},
"autotermination_minutes": 120,
"init_scripts": <script_path_here_as_specified_in_document>
}
-
Use a wait activity until the cluster is created. I have used wait time as 300 seconds.
-
Finally, use the cluster_id returned by the web activity for newly created cluster to run the notebook as
@activity('Web1').output.cluster_id
.
答案2
得分: 0
spark.hadoop.fs.azure.account.auth.type
设置为OAuth
spark.hadoop.fs.azure.account.oauth.provider.type
设置为org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
spark.hadoop.fs.azure.account.oauth2.client.endpoint
设置为https://login.microsoftonline.com/<tenant_id>/oauth2/token
spark.hadoop.fs.azure.account.oauth2.client.id
设置为<client_id>
spark.hadoop.fs.azure.account.oauth2.client.secret
设置为{{secrets/<secret-scope>/<secret-key-name>}}
(它将从包含客户端密钥的秘密范围中获取密钥)
英文:
If you specified init script on abfss://...
then you need to specify corresponding spark.hadoop.fs...
configurations in the "Cluster Spark conf" section (fill the data in <...>
with corresponding values):
spark.hadoop.fs.azure.account.auth.type
set toOAuth
spark.hadoop.fs.azure.account.oauth.provider.type
set toorg.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
spark.hadoop.fs.azure.account.oauth2.client.endpoint
set tohttps://login.microsoftonline.com/<tenant_id>/oauth2/token
spark.hadoop.fs.azure.account.oauth2.client.id
set to<client_id>
spark.hadoop.fs.azure.account.oauth2.client.secret
set to{{secrets/<secret-scope>/<secret-key-name>}}
(it will fetch the key from the secret scope that contains client secret)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论