英文:
How to create Azure Databricks job of type python wheel by using Azure databricks API
问题
我想要在Azure中使用Databricks API创建一个"python wheel"类型的Databricks作业。我有一个需要在此作业中执行的Python wheel。
这个问题与我在这个stackoverflow链接中的另一个问题相关,只是实现这个问题的技术已经改变了。
根据Azure Databricks API文档,我知道如何创建一个可以执行笔记本的Databricks作业。然而,我需要一个"python wheel"类型的Databricks作业。
我的所有代码都实现在一个Python wheel中,需要运行24/7。根据开发团队的要求,他们需要一个"python wheel"类型的作业,而不是"笔记本"。
正如你在Databricks文档中所看到的,已经展示了如何从Databricks工作区创建"python wheel"类型的作业。然而,我需要在DevOps管道中自动化这个过程,所以我想通过对Databricks API进行API调用来实现这一点。以下是我用来创建Databricks作业的代码。这段代码使用一个笔记本来执行代码。正如我提到的,我需要运行一个"python wheel"作业,就像在这里中所解释的那样。在工作区中,你可以看到这种类型的作业:
我的当前代码如下:我的目标是将其更改为运行Python wheel而不是笔记本:
import requests
import os
# both 2.0 and 2.1 API can create job.
dbrks_create_job_url = "https://"+os.environ['DBRKS_INSTANCE']+".azuredatabricks.net/api/2.1/jobs/create"
DBRKS_REQ_HEADERS = {
'Authorization': 'Bearer ' + os.environ['DBRKS_BEARER_TOKEN'],
'X-Databricks-Azure-Workspace-Resource-Id': '/subscriptions/'+ os.environ['DBRKS_SUBSCRIPTION_ID'] +'/resourceGroups/'+ os.environ['DBRKS_RESOURCE_GROUP'] +'/providers/Microsoft.Databricks/workspaces/' + os.environ['DBRKS_WORKSPACE_NAME'],
'X-Databricks-Azure-SP-Management-Token': os.environ['DBRKS_MANAGEMENT_TOKEN']
}
CLUSTER_ID = "\"" + os.environ["DBRKS_CLUSTER_ID"] + "\""
NOTEBOOK_LOCATION = "\"" + os.environ["NOTEBOOK_LOCATION"] + "test-notebook" + "\""
print("Notebook path is {}".format(NOTEBOOK_LOCATION))
print(CLUSTER_ID)
body_json = """
{
"name": "A sample job to trigger from DevOps",
"tasks": [
{
"task_key": "ExecuteNotebook",
"description": "Execute uploaded notebook including tests",
"depends_on": [],
"existing_cluster_id": """" + CLUSTER_ID + """"",
"notebook_task": {
"notebook_path": """" + NOTEBOOK_LOCATION + """"",
"base_parameters": {}
},
"timeout_seconds": 300,
"max_retries": 1,
"min_retry_interval_millis": 5000,
"retry_on_timeout": false
}
],
"email_notifications": {},
"name": "Run_Unit_Tests",
"max_concurrent_runs": 1}
"""
print("Request body in json format:")
print(body_json)
response = requests.post(dbrks_create_job_url, headers=DBRKS_REQ_HEADERS, data=body_json)
if response.status_code == 200:
print("Job created successfully!")
print(response.status_code)
print(response.content)
print("Job Id = {}".format(response.json()['job_id']))
print("##vso[task.setvariable variable=DBRKS_JOB_ID;isOutput=true;]{}".format(response.json()['job_id']))
else:
print("job failed!")
raise Exception(response.content)
希望这能帮助你更改代码以创建一个"python wheel"类型的Databricks作业。
<details>
<summary>英文:</summary>
I would like to **create a databricks job of type "python wheel"** in Azure by using **databricks API**. I have a python wheel that I need to execute in this job.
This question is related to my other question at [this stackoverflow link](https://stackoverflow.com/questions/75579462/how-to-create-azure-databricks-jobs-of-type-python-wheel-by-terraform), just the technology used to implement this has changed.
Following the [Azure databricks API documentation](https://learn.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/jobs) I know how to create a databricks job that can execute a notebook. However, what I need is a **databricks job** of **type "python wheel"**.
All my code is implemented in a python wheel and it needs to run 24/7. According to the requirements that I got from the development team, they need to have a job of type "python wheel" and not "notebook".
As you see [databricks documentation](https://docs.databricks.com/workflows/jobs/how-to-use-python-wheels-in-workflows.html) already shows how a job of type python wheel can be created from the databricks workspace. I, however, need to automate this process in a DevOps pipeline, that's why I would like to do it by making API call to databricks API. Below is the code I have implemented to created a databricks job. This code is using a notebook to execute the code. As I mentioned I need to run a "python wheel" job just as it is explained [here](https://docs.databricks.com/workflows/jobs/how-to-use-python-wheels-in-workflows.html). Below you can see this type of job in the workspace:
[![Databricks job of type python wheel][1]][1]
My current code is as below: **My objective is to change it to run a python wheel instead of a notebook**:
import requests
import os
# both 2.0 and 2.1 API can create job.
dbrks_create_job_url = "https://"+os.environ['DBRKS_INSTANCE']+".azuredatabricks.net/api/2.1/jobs/create"
DBRKS_REQ_HEADERS = {
'Authorization': 'Bearer ' + os.environ['DBRKS_BEARER_TOKEN'],
'X-Databricks-Azure-Workspace-Resource-Id': '/subscriptions/'+ os.environ['DBRKS_SUBSCRIPTION_ID'] +'/resourceGroups/'+ os.environ['DBRKS_RESOURCE_GROUP'] +'/providers/Microsoft.Databricks/workspaces/' + os.environ['DBRKS_WORKSPACE_NAME'],
'X-Databricks-Azure-SP-Management-Token': os.environ['DBRKS_MANAGEMENT_TOKEN']}
CLUSTER_ID = "\"" + os.environ["DBRKS_CLUSTER_ID"] + "\""
NOTEBOOK_LOCATION = "\"" + os.environ["NOTEBOOK_LOCATION"] + "test-notebook" + "\""
print("Notebook path is {}".format(NOTEBOOK_LOCATION))
print(CLUSTER_ID)
body_json = """
{
"name": "A sample job to trigger from DevOps",
"tasks": [
{
"task_key": "ExecuteNotebook",
"description": "Execute uploaded notebook including tests",
"depends_on": [],
"existing_cluster_id": """ + CLUSTER_ID + """,
"notebook_task": {
"notebook_path": """ + NOTEBOOK_LOCATION + """,
"base_parameters": {}
},
"timeout_seconds": 300,
"max_retries": 1,
"min_retry_interval_millis": 5000,
"retry_on_timeout": false
}
],
"email_notifications": {},
"name": "Run_Unit_Tests",
"max_concurrent_runs": 1}
"""
print("Request body in json format:")
print(body_json)
response = requests.post(dbrks_create_job_url, headers=DBRKS_REQ_HEADERS, data=body_json)
if response.status_code == 200:
print("Job created successfully!")
print(response.status_code)
print(response.content)
print("Job Id = {}".format(response.json()['job_id']))
print("##vso[task.setvariable variable=DBRKS_JOB_ID;isOutput=true;]{b}".format(b=response.json()['job_id']))
else:
print("job failed!")
raise Exception(response.content)
[1]: https://i.stack.imgur.com/kBBPU.png
</details>
# 答案1
**得分**: 1
根据@Alex Ott提到的,您需要使用`pyhton_wheel_task`而不是`notebook_task`。
基于[Job API 2.1文档](https://redocly.github.io/redoc/?url=https://learn.microsoft.com/azure/databricks/_extras/api-refs/jobs-2.1-azure.yaml)。
<details>
<summary>英文:</summary>
As already mentioned by @Alex Ott, instead of using the `notebook_task` you need to use `pyhton_wheel_task`.
Based on the [Job API 2.1 docs](https://redocly.github.io/redoc/?url=https://learn.microsoft.com/azure/databricks/_extras/api-refs/jobs-2.1-azure.yaml).
</details>
# 答案2
**得分**: 0
只需使用REST API文档中描述的`python_wheel_task`,而不是`notebook_task`。此外,您需要在JSON对象中提供`package_name`和`entry_point`参数。
不要忘记在`libraries`块中添加wheel文件。
<details>
<summary>英文:</summary>
it's simple - instead of `notebook_task` you just need to use `python_wheel_task` as it's described in the REST API docs. And you need to provide `package_name` and `entry_point` parameters inside the JSON object.
And don't forget to add the wheel file in the `libraries` block.
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论