如何使用Azure Databricks API创建Python wheel类型的Azure Databricks作业。

huangapple go评论61阅读模式
英文:

How to create Azure Databricks job of type python wheel by using Azure databricks API

问题

我想要在Azure中使用Databricks API创建一个"python wheel"类型的Databricks作业。我有一个需要在此作业中执行的Python wheel。

这个问题与我在这个stackoverflow链接中的另一个问题相关,只是实现这个问题的技术已经改变了。

根据Azure Databricks API文档,我知道如何创建一个可以执行笔记本的Databricks作业。然而,我需要一个"python wheel"类型的Databricks作业。

我的所有代码都实现在一个Python wheel中,需要运行24/7。根据开发团队的要求,他们需要一个"python wheel"类型的作业,而不是"笔记本"。

正如你在Databricks文档中所看到的,已经展示了如何从Databricks工作区创建"python wheel"类型的作业。然而,我需要在DevOps管道中自动化这个过程,所以我想通过对Databricks API进行API调用来实现这一点。以下是我用来创建Databricks作业的代码。这段代码使用一个笔记本来执行代码。正如我提到的,我需要运行一个"python wheel"作业,就像在这里中所解释的那样。在工作区中,你可以看到这种类型的作业:

如何使用Azure Databricks API创建Python wheel类型的Azure Databricks作业。

我的当前代码如下:我的目标是将其更改为运行Python wheel而不是笔记本

import requests
import os

# both 2.0 and 2.1 API can create job.
dbrks_create_job_url = "https://"+os.environ['DBRKS_INSTANCE']+".azuredatabricks.net/api/2.1/jobs/create"

DBRKS_REQ_HEADERS = {
    'Authorization': 'Bearer ' + os.environ['DBRKS_BEARER_TOKEN'],
    'X-Databricks-Azure-Workspace-Resource-Id': '/subscriptions/'+ os.environ['DBRKS_SUBSCRIPTION_ID'] +'/resourceGroups/'+ os.environ['DBRKS_RESOURCE_GROUP'] +'/providers/Microsoft.Databricks/workspaces/' + os.environ['DBRKS_WORKSPACE_NAME'],
    'X-Databricks-Azure-SP-Management-Token': os.environ['DBRKS_MANAGEMENT_TOKEN']
}

CLUSTER_ID = "\"" + os.environ["DBRKS_CLUSTER_ID"] + "\""
NOTEBOOK_LOCATION = "\"" + os.environ["NOTEBOOK_LOCATION"] + "test-notebook" + "\""
print("Notebook path is {}".format(NOTEBOOK_LOCATION))
print(CLUSTER_ID)

body_json = """
    {
    "name": "A sample job to trigger from DevOps",
    "tasks": [
        {
        "task_key": "ExecuteNotebook",
        "description": "Execute uploaded notebook including tests",
        "depends_on": [],
        "existing_cluster_id": """" + CLUSTER_ID + """"",
        "notebook_task": {
          "notebook_path": """" + NOTEBOOK_LOCATION + """"",
          "base_parameters": {}
        },
        "timeout_seconds": 300,
        "max_retries": 1,
        "min_retry_interval_millis": 5000,
        "retry_on_timeout": false
      }
    ],
    "email_notifications": {},
    "name": "Run_Unit_Tests",
    "max_concurrent_runs": 1}
"""
print("Request body in json format:")
print(body_json)

response = requests.post(dbrks_create_job_url, headers=DBRKS_REQ_HEADERS, data=body_json)

if response.status_code == 200:
    print("Job created successfully!")
    print(response.status_code)
    print(response.content)
    print("Job Id = {}".format(response.json()['job_id']))
    print("##vso[task.setvariable variable=DBRKS_JOB_ID;isOutput=true;]{}".format(response.json()['job_id']))
else:
    print("job failed!")
    raise Exception(response.content)

希望这能帮助你更改代码以创建一个"python wheel"类型的Databricks作业。
<details>
<summary>英文:</summary>
I would like to **create a databricks job of type &quot;python wheel&quot;** in Azure by using **databricks API**. I have a python wheel that I need to execute in this job. 
This question is related to my other question at [this stackoverflow link](https://stackoverflow.com/questions/75579462/how-to-create-azure-databricks-jobs-of-type-python-wheel-by-terraform), just the technology used to implement this has changed. 
Following the [Azure databricks API documentation](https://learn.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/jobs) I know how to create a databricks job that can execute a notebook. However, what I need is a **databricks job** of **type &quot;python wheel&quot;**.
All my code is implemented in a python wheel and it needs to run 24/7. According to the requirements that I got from the development team, they need to have a job of type &quot;python wheel&quot; and not &quot;notebook&quot;.
As you see [databricks documentation](https://docs.databricks.com/workflows/jobs/how-to-use-python-wheels-in-workflows.html) already shows how a job of type python wheel can be created from the databricks workspace. I, however, need to automate this process in a DevOps pipeline, that&#39;s why I would like to do it by making API call to databricks API. Below is the code I have implemented to created a databricks job. This code is using a notebook to execute the code. As I mentioned I need to run a &quot;python wheel&quot; job just as it is explained [here](https://docs.databricks.com/workflows/jobs/how-to-use-python-wheels-in-workflows.html). Below you can see this type of job in the workspace:
[![Databricks job of type python wheel][1]][1]
My current code is as below: **My objective is to change it to run a python wheel instead of a notebook**:
import requests
import os
# both 2.0 and 2.1 API can create job.
dbrks_create_job_url = &quot;https://&quot;+os.environ[&#39;DBRKS_INSTANCE&#39;]+&quot;.azuredatabricks.net/api/2.1/jobs/create&quot;
DBRKS_REQ_HEADERS = {
&#39;Authorization&#39;: &#39;Bearer &#39; + os.environ[&#39;DBRKS_BEARER_TOKEN&#39;],
&#39;X-Databricks-Azure-Workspace-Resource-Id&#39;: &#39;/subscriptions/&#39;+ os.environ[&#39;DBRKS_SUBSCRIPTION_ID&#39;] +&#39;/resourceGroups/&#39;+ os.environ[&#39;DBRKS_RESOURCE_GROUP&#39;] +&#39;/providers/Microsoft.Databricks/workspaces/&#39; + os.environ[&#39;DBRKS_WORKSPACE_NAME&#39;],
&#39;X-Databricks-Azure-SP-Management-Token&#39;: os.environ[&#39;DBRKS_MANAGEMENT_TOKEN&#39;]}
CLUSTER_ID = &quot;\&quot;&quot; + os.environ[&quot;DBRKS_CLUSTER_ID&quot;] + &quot;\&quot;&quot;
NOTEBOOK_LOCATION = &quot;\&quot;&quot; + os.environ[&quot;NOTEBOOK_LOCATION&quot;] + &quot;test-notebook&quot; + &quot;\&quot;&quot;
print(&quot;Notebook path is {}&quot;.format(NOTEBOOK_LOCATION))
print(CLUSTER_ID)
body_json = &quot;&quot;&quot;
{
&quot;name&quot;: &quot;A sample job to trigger from DevOps&quot;,
&quot;tasks&quot;: [
{
&quot;task_key&quot;: &quot;ExecuteNotebook&quot;,
&quot;description&quot;: &quot;Execute uploaded notebook including tests&quot;,
&quot;depends_on&quot;: [],
&quot;existing_cluster_id&quot;: &quot;&quot;&quot; + CLUSTER_ID + &quot;&quot;&quot;,
&quot;notebook_task&quot;: {
&quot;notebook_path&quot;: &quot;&quot;&quot; + NOTEBOOK_LOCATION + &quot;&quot;&quot;,
&quot;base_parameters&quot;: {}
},
&quot;timeout_seconds&quot;: 300,
&quot;max_retries&quot;: 1,
&quot;min_retry_interval_millis&quot;: 5000,
&quot;retry_on_timeout&quot;: false
}
],
&quot;email_notifications&quot;: {},
&quot;name&quot;: &quot;Run_Unit_Tests&quot;,
&quot;max_concurrent_runs&quot;: 1}
&quot;&quot;&quot;
print(&quot;Request body in json format:&quot;)
print(body_json)
response = requests.post(dbrks_create_job_url, headers=DBRKS_REQ_HEADERS, data=body_json) 
if response.status_code == 200:
print(&quot;Job created successfully!&quot;)
print(response.status_code)
print(response.content)
print(&quot;Job Id = {}&quot;.format(response.json()[&#39;job_id&#39;]))
print(&quot;##vso[task.setvariable variable=DBRKS_JOB_ID;isOutput=true;]{b}&quot;.format(b=response.json()[&#39;job_id&#39;])) 
else:
print(&quot;job failed!&quot;)
raise Exception(response.content)
[1]: https://i.stack.imgur.com/kBBPU.png
</details>
# 答案1
**得分**: 1
根据@Alex Ott提到的,您需要使用`pyhton_wheel_task`而不是`notebook_task`。
基于[Job API 2.1文档](https://redocly.github.io/redoc/?url=https://learn.microsoft.com/azure/databricks/_extras/api-refs/jobs-2.1-azure.yaml)。
<details>
<summary>英文:</summary>
As already mentioned by @Alex Ott, instead of using the `notebook_task` you need to use `pyhton_wheel_task`.
Based on the [Job API 2.1 docs](https://redocly.github.io/redoc/?url=https://learn.microsoft.com/azure/databricks/_extras/api-refs/jobs-2.1-azure.yaml).
</details>
# 答案2
**得分**: 0
只需使用REST API文档中描述的`python_wheel_task`,而不是`notebook_task`。此外,您需要在JSON对象中提供`package_name`和`entry_point`参数。
不要忘记在`libraries`块中添加wheel文件。
<details>
<summary>英文:</summary>
it&#39;s simple - instead of `notebook_task` you just need to use `python_wheel_task` as it&#39;s described in the REST API docs. And you need to provide `package_name` and `entry_point` parameters inside the JSON object.
And don&#39;t forget to add the wheel file in the `libraries` block.
</details>

huangapple
  • 本文由 发表于 2023年2月27日 19:13:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/75579739.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定