2023年2月27日 19:13:48go评论104阅读模式

英文:

How to create Azure Databricks job of type python wheel by using Azure databricks API

问题

我想要在Azure中使用Databricks API创建一个"python wheel"类型的Databricks作业。我有一个需要在此作业中执行的Python wheel。

这个问题与我在这个stackoverflow链接中的另一个问题相关，只是实现这个问题的技术已经改变了。

根据Azure Databricks API文档，我知道如何创建一个可以执行笔记本的Databricks作业。然而，我需要一个"python wheel"类型的Databricks作业。

我的所有代码都实现在一个Python wheel中，需要运行24/7。根据开发团队的要求，他们需要一个"python wheel"类型的作业，而不是"笔记本"。

正如你在Databricks文档中所看到的，已经展示了如何从Databricks工作区创建"python wheel"类型的作业。然而，我需要在DevOps管道中自动化这个过程，所以我想通过对Databricks API进行API调用来实现这一点。以下是我用来创建Databricks作业的代码。这段代码使用一个笔记本来执行代码。正如我提到的，我需要运行一个"python wheel"作业，就像在这里中所解释的那样。在工作区中，你可以看到这种类型的作业：

我的当前代码如下：我的目标是将其更改为运行Python wheel而不是笔记本：

import requests
import os
# both 2.0 and 2.1 API can create job.
dbrks_create_job_url = "https://"+os.environ['DBRKS_INSTANCE']+".azuredatabricks.net/api/2.1/jobs/create"
DBRKS_REQ_HEADERS = {
    'Authorization': 'Bearer ' + os.environ['DBRKS_BEARER_TOKEN'],
    'X-Databricks-Azure-Workspace-Resource-Id': '/subscriptions/'+ os.environ['DBRKS_SUBSCRIPTION_ID'] +'/resourceGroups/'+ os.environ['DBRKS_RESOURCE_GROUP'] +'/providers/Microsoft.Databricks/workspaces/' + os.environ['DBRKS_WORKSPACE_NAME'],
    'X-Databricks-Azure-SP-Management-Token': os.environ['DBRKS_MANAGEMENT_TOKEN']
}
CLUSTER_ID = "\"" + os.environ["DBRKS_CLUSTER_ID"] + "\""
NOTEBOOK_LOCATION = "\"" + os.environ["NOTEBOOK_LOCATION"] + "test-notebook" + "\""
print("Notebook path is {}".format(NOTEBOOK_LOCATION))
print(CLUSTER_ID)
body_json = """
    {
    "name": "A sample job to trigger from DevOps",
    "tasks": [
        {
        "task_key": "ExecuteNotebook",
        "description": "Execute uploaded notebook including tests",
        "depends_on": [],
        "existing_cluster_id": """" + CLUSTER_ID + """"",
        "notebook_task": {
          "notebook_path": """" + NOTEBOOK_LOCATION + """"",
          "base_parameters": {}
        },
        "timeout_seconds": 300,
        "max_retries": 1,
        "min_retry_interval_millis": 5000,
        "retry_on_timeout": false
      }
    ],
    "email_notifications": {},
    "name": "Run_Unit_Tests",
    "max_concurrent_runs": 1}
"""
print("Request body in json format:")
print(body_json)
response = requests.post(dbrks_create_job_url, headers=DBRKS_REQ_HEADERS, data=body_json)
if response.status_code == 200:
    print("Job created successfully!")
    print(response.status_code)
    print(response.content)
    print("Job Id = {}".format(response.json()['job_id']))
    print("##vso[task.setvariable variable=DBRKS_JOB_ID;isOutput=true;]{}".format(response.json()['job_id']))
else:
    print("job failed!")
    raise Exception(response.content)


希望这能帮助你更改代码以创建一个"python wheel"类型的Databricks作业。
<details>
<summary>英文:</summary>
I would like to **create a databricks job of type &quot;python wheel&quot;** in Azure by using **databricks API**. I have a python wheel that I need to execute in this job. 
This question is related to my other question at [this stackoverflow link](https://stackoverflow.com/questions/75579462/how-to-create-azure-databricks-jobs-of-type-python-wheel-by-terraform), just the technology used to implement this has changed. 
Following the [Azure databricks API documentation](https://learn.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/jobs) I know how to create a databricks job that can execute a notebook. However, what I need is a **databricks job** of **type &quot;python wheel&quot;**.
All my code is implemented in a python wheel and it needs to run 24/7. According to the requirements that I got from the development team, they need to have a job of type &quot;python wheel&quot; and not &quot;notebook&quot;.
As you see [databricks documentation](https://docs.databricks.com/workflows/jobs/how-to-use-python-wheels-in-workflows.html) already shows how a job of type python wheel can be created from the databricks workspace. I, however, need to automate this process in a DevOps pipeline, that&#39;s why I would like to do it by making API call to databricks API. Below is the code I have implemented to created a databricks job. This code is using a notebook to execute the code. As I mentioned I need to run a &quot;python wheel&quot; job just as it is explained [here](https://docs.databricks.com/workflows/jobs/how-to-use-python-wheels-in-workflows.html). Below you can see this type of job in the workspace:
[![Databricks job of type python wheel][1]][1]
My current code is as below: **My objective is to change it to run a python wheel instead of a notebook**:
import requests
import os
# both 2.0 and 2.1 API can create job.
dbrks_create_job_url = &quot;https://&quot;+os.environ[&#39;DBRKS_INSTANCE&#39;]+&quot;.azuredatabricks.net/api/2.1/jobs/create&quot;
DBRKS_REQ_HEADERS = {
&#39;Authorization&#39;: &#39;Bearer &#39; + os.environ[&#39;DBRKS_BEARER_TOKEN&#39;],
&#39;X-Databricks-Azure-Workspace-Resource-Id&#39;: &#39;/subscriptions/&#39;+ os.environ[&#39;DBRKS_SUBSCRIPTION_ID&#39;] +&#39;/resourceGroups/&#39;+ os.environ[&#39;DBRKS_RESOURCE_GROUP&#39;] +&#39;/providers/Microsoft.Databricks/workspaces/&#39; + os.environ[&#39;DBRKS_WORKSPACE_NAME&#39;],
&#39;X-Databricks-Azure-SP-Management-Token&#39;: os.environ[&#39;DBRKS_MANAGEMENT_TOKEN&#39;]}
CLUSTER_ID = &quot;\&quot;&quot; + os.environ[&quot;DBRKS_CLUSTER_ID&quot;] + &quot;\&quot;&quot;
NOTEBOOK_LOCATION = &quot;\&quot;&quot; + os.environ[&quot;NOTEBOOK_LOCATION&quot;] + &quot;test-notebook&quot; + &quot;\&quot;&quot;
print(&quot;Notebook path is {}&quot;.format(NOTEBOOK_LOCATION))
print(CLUSTER_ID)
body_json = &quot;&quot;&quot;
{
&quot;name&quot;: &quot;A sample job to trigger from DevOps&quot;,
&quot;tasks&quot;: [
{
&quot;task_key&quot;: &quot;ExecuteNotebook&quot;,
&quot;description&quot;: &quot;Execute uploaded notebook including tests&quot;,
&quot;depends_on&quot;: [],
&quot;existing_cluster_id&quot;: &quot;&quot;&quot; + CLUSTER_ID + &quot;&quot;&quot;,
&quot;notebook_task&quot;: {
&quot;notebook_path&quot;: &quot;&quot;&quot; + NOTEBOOK_LOCATION + &quot;&quot;&quot;,
&quot;base_parameters&quot;: {}
},
&quot;timeout_seconds&quot;: 300,
&quot;max_retries&quot;: 1,
&quot;min_retry_interval_millis&quot;: 5000,
&quot;retry_on_timeout&quot;: false
}
],
&quot;email_notifications&quot;: {},
&quot;name&quot;: &quot;Run_Unit_Tests&quot;,
&quot;max_concurrent_runs&quot;: 1}
&quot;&quot;&quot;
print(&quot;Request body in json format:&quot;)
print(body_json)
response = requests.post(dbrks_create_job_url, headers=DBRKS_REQ_HEADERS, data=body_json) 
if response.status_code == 200:
print(&quot;Job created successfully!&quot;)
print(response.status_code)
print(response.content)
print(&quot;Job Id = {}&quot;.format(response.json()[&#39;job_id&#39;]))
print(&quot;##vso[task.setvariable variable=DBRKS_JOB_ID;isOutput=true;]{b}&quot;.format(b=response.json()[&#39;job_id&#39;])) 
else:
print(&quot;job failed!&quot;)
raise Exception(response.content)
[1]: https://i.stack.imgur.com/kBBPU.png
</details>
# 答案1
**得分**: 1
根据@Alex Ott提到的，您需要使用`pyhton_wheel_task`而不是`notebook_task`。
基于[Job API 2.1文档](https://redocly.github.io/redoc/?url=https://learn.microsoft.com/azure/databricks/_extras/api-refs/jobs-2.1-azure.yaml)。
<details>
<summary>英文:</summary>
As already mentioned by @Alex Ott, instead of using the `notebook_task` you need to use `pyhton_wheel_task`.
Based on the [Job API 2.1 docs](https://redocly.github.io/redoc/?url=https://learn.microsoft.com/azure/databricks/_extras/api-refs/jobs-2.1-azure.yaml).
</details>
# 答案2
**得分**: 0
只需使用REST API文档中描述的`python_wheel_task`，而不是`notebook_task`。此外，您需要在JSON对象中提供`package_name`和`entry_point`参数。
不要忘记在`libraries`块中添加wheel文件。
<details>
<summary>英文:</summary>
it&#39;s simple - instead of `notebook_task` you just need to use `python_wheel_task` as it&#39;s described in the REST API docs. And you need to provide `package_name` and `entry_point` parameters inside the JSON object.
And don&#39;t forget to add the wheel file in the `libraries` block.
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用Azure Databricks API创建Python wheel类型的Azure Databricks作业。

问题

Azure Static Web App 无法连接到 Supabase。

(Forbidden) The user, group or application 'appid="**" does not have secrets get permission on key vault 'Key;location=eastus'

无法使用Azure管道将Remix应用部署到Azure应用服务。

DevOps管道未能从Bitbucket触发

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。