如何使用Azure Databricks API创建Python wheel类型的Azure Databricks作业。

huangapple go评论164阅读模式
英文:

How to create Azure Databricks job of type python wheel by using Azure databricks API

问题

我想要在Azure中使用Databricks API创建一个"python wheel"类型的Databricks作业。我有一个需要在此作业中执行的Python wheel。

这个问题与我在这个stackoverflow链接中的另一个问题相关,只是实现这个问题的技术已经改变了。

根据Azure Databricks API文档,我知道如何创建一个可以执行笔记本的Databricks作业。然而,我需要一个"python wheel"类型的Databricks作业。

我的所有代码都实现在一个Python wheel中,需要运行24/7。根据开发团队的要求,他们需要一个"python wheel"类型的作业,而不是"笔记本"。

正如你在Databricks文档中所看到的,已经展示了如何从Databricks工作区创建"python wheel"类型的作业。然而,我需要在DevOps管道中自动化这个过程,所以我想通过对Databricks API进行API调用来实现这一点。以下是我用来创建Databricks作业的代码。这段代码使用一个笔记本来执行代码。正如我提到的,我需要运行一个"python wheel"作业,就像在这里中所解释的那样。在工作区中,你可以看到这种类型的作业:

如何使用Azure Databricks API创建Python wheel类型的Azure Databricks作业。

我的当前代码如下:我的目标是将其更改为运行Python wheel而不是笔记本

import requests
import os

# both 2.0 and 2.1 API can create job.
dbrks_create_job_url = "https://"+os.environ['DBRKS_INSTANCE']+".azuredatabricks.net/api/2.1/jobs/create"

DBRKS_REQ_HEADERS = {
    'Authorization': 'Bearer ' + os.environ['DBRKS_BEARER_TOKEN'],
    'X-Databricks-Azure-Workspace-Resource-Id': '/subscriptions/'+ os.environ['DBRKS_SUBSCRIPTION_ID'] +'/resourceGroups/'+ os.environ['DBRKS_RESOURCE_GROUP'] +'/providers/Microsoft.Databricks/workspaces/' + os.environ['DBRKS_WORKSPACE_NAME'],
    'X-Databricks-Azure-SP-Management-Token': os.environ['DBRKS_MANAGEMENT_TOKEN']
}

CLUSTER_ID = "\"" + os.environ["DBRKS_CLUSTER_ID"] + "\""
NOTEBOOK_LOCATION = "\"" + os.environ["NOTEBOOK_LOCATION"] + "test-notebook" + "\""
print("Notebook path is {}".format(NOTEBOOK_LOCATION))
print(CLUSTER_ID)

body_json = """
    {
    "name": "A sample job to trigger from DevOps",
    "tasks": [
        {
        "task_key": "ExecuteNotebook",
        "description": "Execute uploaded notebook including tests",
        "depends_on": [],
        "existing_cluster_id": """" + CLUSTER_ID + """"",
        "notebook_task": {
          "notebook_path": """" + NOTEBOOK_LOCATION + """"",
          "base_parameters": {}
        },
        "timeout_seconds": 300,
        "max_retries": 1,
        "min_retry_interval_millis": 5000,
        "retry_on_timeout": false
      }
    ],
    "email_notifications": {},
    "name": "Run_Unit_Tests",
    "max_concurrent_runs": 1}
"""
print("Request body in json format:")
print(body_json)

response = requests.post(dbrks_create_job_url, headers=DBRKS_REQ_HEADERS, data=body_json)

if response.status_code == 200:
    print("Job created successfully!")
    print(response.status_code)
    print(response.content)
    print("Job Id = {}".format(response.json()['job_id']))
    print("##vso[task.setvariable variable=DBRKS_JOB_ID;isOutput=true;]{}".format(response.json()['job_id']))
else:
    print("job failed!")
    raise Exception(response.content)

希望这能帮助你更改代码以创建一个"python wheel"类型的Databricks作业。

<details>
<summary>英文:</summary>

I would like to **create a databricks job of type &quot;python wheel&quot;** in Azure by using **databricks API**. I have a python wheel that I need to execute in this job. 

This question is related to my other question at [this stackoverflow link](https://stackoverflow.com/questions/75579462/how-to-create-azure-databricks-jobs-of-type-python-wheel-by-terraform), just the technology used to implement this has changed. 

Following the [Azure databricks API documentation](https://learn.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/jobs) I know how to create a databricks job that can execute a notebook. However, what I need is a **databricks job** of **type &quot;python wheel&quot;**.
All my code is implemented in a python wheel and it needs to run 24/7. According to the requirements that I got from the development team, they need to have a job of type &quot;python wheel&quot; and not &quot;notebook&quot;.

As you see [databricks documentation](https://docs.databricks.com/workflows/jobs/how-to-use-python-wheels-in-workflows.html) already shows how a job of type python wheel can be created from the databricks workspace. I, however, need to automate this process in a DevOps pipeline, that&#39;s why I would like to do it by making API call to databricks API. Below is the code I have implemented to created a databricks job. This code is using a notebook to execute the code. As I mentioned I need to run a &quot;python wheel&quot; job just as it is explained [here](https://docs.databricks.com/workflows/jobs/how-to-use-python-wheels-in-workflows.html). Below you can see this type of job in the workspace:

[![Databricks job of type python wheel][1]][1]

My current code is as below: **My objective is to change it to run a python wheel instead of a notebook**:

    import requests
    import os


    # both 2.0 and 2.1 API can create job.
    dbrks_create_job_url = &quot;https://&quot;+os.environ[&#39;DBRKS_INSTANCE&#39;]+&quot;.azuredatabricks.net/api/2.1/jobs/create&quot;

    DBRKS_REQ_HEADERS = {
        &#39;Authorization&#39;: &#39;Bearer &#39; + os.environ[&#39;DBRKS_BEARER_TOKEN&#39;],
        &#39;X-Databricks-Azure-Workspace-Resource-Id&#39;: &#39;/subscriptions/&#39;+ os.environ[&#39;DBRKS_SUBSCRIPTION_ID&#39;] +&#39;/resourceGroups/&#39;+ os.environ[&#39;DBRKS_RESOURCE_GROUP&#39;] +&#39;/providers/Microsoft.Databricks/workspaces/&#39; + os.environ[&#39;DBRKS_WORKSPACE_NAME&#39;],
        &#39;X-Databricks-Azure-SP-Management-Token&#39;: os.environ[&#39;DBRKS_MANAGEMENT_TOKEN&#39;]}

    CLUSTER_ID = &quot;\&quot;&quot; + os.environ[&quot;DBRKS_CLUSTER_ID&quot;] + &quot;\&quot;&quot;
    NOTEBOOK_LOCATION = &quot;\&quot;&quot; + os.environ[&quot;NOTEBOOK_LOCATION&quot;] + &quot;test-notebook&quot; + &quot;\&quot;&quot;
    print(&quot;Notebook path is {}&quot;.format(NOTEBOOK_LOCATION))
    print(CLUSTER_ID)

    body_json = &quot;&quot;&quot;
        {
        &quot;name&quot;: &quot;A sample job to trigger from DevOps&quot;,
        &quot;tasks&quot;: [
            {
            &quot;task_key&quot;: &quot;ExecuteNotebook&quot;,
            &quot;description&quot;: &quot;Execute uploaded notebook including tests&quot;,
            &quot;depends_on&quot;: [],
            &quot;existing_cluster_id&quot;: &quot;&quot;&quot; + CLUSTER_ID + &quot;&quot;&quot;,
            &quot;notebook_task&quot;: {
              &quot;notebook_path&quot;: &quot;&quot;&quot; + NOTEBOOK_LOCATION + &quot;&quot;&quot;,
              &quot;base_parameters&quot;: {}
            },
            &quot;timeout_seconds&quot;: 300,
            &quot;max_retries&quot;: 1,
            &quot;min_retry_interval_millis&quot;: 5000,
            &quot;retry_on_timeout&quot;: false
          }
    ],
        &quot;email_notifications&quot;: {},
        &quot;name&quot;: &quot;Run_Unit_Tests&quot;,
        &quot;max_concurrent_runs&quot;: 1}
    &quot;&quot;&quot;

    print(&quot;Request body in json format:&quot;)
    print(body_json)

    response = requests.post(dbrks_create_job_url, headers=DBRKS_REQ_HEADERS, data=body_json) 

    if response.status_code == 200:
        print(&quot;Job created successfully!&quot;)
        print(response.status_code)
        print(response.content)
        print(&quot;Job Id = {}&quot;.format(response.json()[&#39;job_id&#39;]))
        print(&quot;##vso[task.setvariable variable=DBRKS_JOB_ID;isOutput=true;]{b}&quot;.format(b=response.json()[&#39;job_id&#39;])) 
    else:
        print(&quot;job failed!&quot;)
        raise Exception(response.content)


  [1]: https://i.stack.imgur.com/kBBPU.png

</details>


# 答案1
**得分**: 1

根据@Alex Ott提到的,您需要使用`pyhton_wheel_task`而不是`notebook_task`。

基于[Job API 2.1文档](https://redocly.github.io/redoc/?url=https://learn.microsoft.com/azure/databricks/_extras/api-refs/jobs-2.1-azure.yaml)。

<details>
<summary>英文:</summary>

As already mentioned by @Alex Ott, instead of using the `notebook_task` you need to use `pyhton_wheel_task`.

Based on the [Job API 2.1 docs](https://redocly.github.io/redoc/?url=https://learn.microsoft.com/azure/databricks/_extras/api-refs/jobs-2.1-azure.yaml).

</details>



# 答案2
**得分**: 0

只需使用REST API文档中描述的`python_wheel_task`,而不是`notebook_task`。此外,您需要在JSON对象中提供`package_name`和`entry_point`参数。

不要忘记在`libraries`块中添加wheel文件。

<details>
<summary>英文:</summary>

it&#39;s simple - instead of `notebook_task` you just need to use `python_wheel_task` as it&#39;s described in the REST API docs. And you need to provide `package_name` and `entry_point` parameters inside the JSON object.

And don&#39;t forget to add the wheel file in the `libraries` block.

</details>



huangapple
  • 本文由 发表于 2023年2月27日 19:13:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/75579739.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定