如何使用Azure Databricks API创建Python wheel类型的Azure Databricks作业。

huangapple go评论104阅读模式
英文:

How to create Azure Databricks job of type python wheel by using Azure databricks API

问题

我想要在Azure中使用Databricks API创建一个"python wheel"类型的Databricks作业。我有一个需要在此作业中执行的Python wheel。

这个问题与我在这个stackoverflow链接中的另一个问题相关,只是实现这个问题的技术已经改变了。

根据Azure Databricks API文档,我知道如何创建一个可以执行笔记本的Databricks作业。然而,我需要一个"python wheel"类型的Databricks作业。

我的所有代码都实现在一个Python wheel中,需要运行24/7。根据开发团队的要求,他们需要一个"python wheel"类型的作业,而不是"笔记本"。

正如你在Databricks文档中所看到的,已经展示了如何从Databricks工作区创建"python wheel"类型的作业。然而,我需要在DevOps管道中自动化这个过程,所以我想通过对Databricks API进行API调用来实现这一点。以下是我用来创建Databricks作业的代码。这段代码使用一个笔记本来执行代码。正如我提到的,我需要运行一个"python wheel"作业,就像在这里中所解释的那样。在工作区中,你可以看到这种类型的作业:

如何使用Azure Databricks API创建Python wheel类型的Azure Databricks作业。

我的当前代码如下:我的目标是将其更改为运行Python wheel而不是笔记本

  1. import requests
  2. import os
  3. # both 2.0 and 2.1 API can create job.
  4. dbrks_create_job_url = "https://"+os.environ['DBRKS_INSTANCE']+".azuredatabricks.net/api/2.1/jobs/create"
  5. DBRKS_REQ_HEADERS = {
  6. 'Authorization': 'Bearer ' + os.environ['DBRKS_BEARER_TOKEN'],
  7. 'X-Databricks-Azure-Workspace-Resource-Id': '/subscriptions/'+ os.environ['DBRKS_SUBSCRIPTION_ID'] +'/resourceGroups/'+ os.environ['DBRKS_RESOURCE_GROUP'] +'/providers/Microsoft.Databricks/workspaces/' + os.environ['DBRKS_WORKSPACE_NAME'],
  8. 'X-Databricks-Azure-SP-Management-Token': os.environ['DBRKS_MANAGEMENT_TOKEN']
  9. }
  10. CLUSTER_ID = "\"" + os.environ["DBRKS_CLUSTER_ID"] + "\""
  11. NOTEBOOK_LOCATION = "\"" + os.environ["NOTEBOOK_LOCATION"] + "test-notebook" + "\""
  12. print("Notebook path is {}".format(NOTEBOOK_LOCATION))
  13. print(CLUSTER_ID)
  14. body_json = """
  15. {
  16. "name": "A sample job to trigger from DevOps",
  17. "tasks": [
  18. {
  19. "task_key": "ExecuteNotebook",
  20. "description": "Execute uploaded notebook including tests",
  21. "depends_on": [],
  22. "existing_cluster_id": """" + CLUSTER_ID + """"",
  23. "notebook_task": {
  24. "notebook_path": """" + NOTEBOOK_LOCATION + """"",
  25. "base_parameters": {}
  26. },
  27. "timeout_seconds": 300,
  28. "max_retries": 1,
  29. "min_retry_interval_millis": 5000,
  30. "retry_on_timeout": false
  31. }
  32. ],
  33. "email_notifications": {},
  34. "name": "Run_Unit_Tests",
  35. "max_concurrent_runs": 1}
  36. """
  37. print("Request body in json format:")
  38. print(body_json)
  39. response = requests.post(dbrks_create_job_url, headers=DBRKS_REQ_HEADERS, data=body_json)
  40. if response.status_code == 200:
  41. print("Job created successfully!")
  42. print(response.status_code)
  43. print(response.content)
  44. print("Job Id = {}".format(response.json()['job_id']))
  45. print("##vso[task.setvariable variable=DBRKS_JOB_ID;isOutput=true;]{}".format(response.json()['job_id']))
  46. else:
  47. print("job failed!")
  48. raise Exception(response.content)
  1. 希望这能帮助你更改代码以创建一个"python wheel"类型的Databricks作业。
  2. <details>
  3. <summary>英文:</summary>
  4. I would like to **create a databricks job of type &quot;python wheel&quot;** in Azure by using **databricks API**. I have a python wheel that I need to execute in this job.
  5. This question is related to my other question at [this stackoverflow link](https://stackoverflow.com/questions/75579462/how-to-create-azure-databricks-jobs-of-type-python-wheel-by-terraform), just the technology used to implement this has changed.
  6. Following the [Azure databricks API documentation](https://learn.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/jobs) I know how to create a databricks job that can execute a notebook. However, what I need is a **databricks job** of **type &quot;python wheel&quot;**.
  7. All my code is implemented in a python wheel and it needs to run 24/7. According to the requirements that I got from the development team, they need to have a job of type &quot;python wheel&quot; and not &quot;notebook&quot;.
  8. As you see [databricks documentation](https://docs.databricks.com/workflows/jobs/how-to-use-python-wheels-in-workflows.html) already shows how a job of type python wheel can be created from the databricks workspace. I, however, need to automate this process in a DevOps pipeline, that&#39;s why I would like to do it by making API call to databricks API. Below is the code I have implemented to created a databricks job. This code is using a notebook to execute the code. As I mentioned I need to run a &quot;python wheel&quot; job just as it is explained [here](https://docs.databricks.com/workflows/jobs/how-to-use-python-wheels-in-workflows.html). Below you can see this type of job in the workspace:
  9. [![Databricks job of type python wheel][1]][1]
  10. My current code is as below: **My objective is to change it to run a python wheel instead of a notebook**:
  11. import requests
  12. import os
  13. # both 2.0 and 2.1 API can create job.
  14. dbrks_create_job_url = &quot;https://&quot;+os.environ[&#39;DBRKS_INSTANCE&#39;]+&quot;.azuredatabricks.net/api/2.1/jobs/create&quot;
  15. DBRKS_REQ_HEADERS = {
  16. &#39;Authorization&#39;: &#39;Bearer &#39; + os.environ[&#39;DBRKS_BEARER_TOKEN&#39;],
  17. &#39;X-Databricks-Azure-Workspace-Resource-Id&#39;: &#39;/subscriptions/&#39;+ os.environ[&#39;DBRKS_SUBSCRIPTION_ID&#39;] +&#39;/resourceGroups/&#39;+ os.environ[&#39;DBRKS_RESOURCE_GROUP&#39;] +&#39;/providers/Microsoft.Databricks/workspaces/&#39; + os.environ[&#39;DBRKS_WORKSPACE_NAME&#39;],
  18. &#39;X-Databricks-Azure-SP-Management-Token&#39;: os.environ[&#39;DBRKS_MANAGEMENT_TOKEN&#39;]}
  19. CLUSTER_ID = &quot;\&quot;&quot; + os.environ[&quot;DBRKS_CLUSTER_ID&quot;] + &quot;\&quot;&quot;
  20. NOTEBOOK_LOCATION = &quot;\&quot;&quot; + os.environ[&quot;NOTEBOOK_LOCATION&quot;] + &quot;test-notebook&quot; + &quot;\&quot;&quot;
  21. print(&quot;Notebook path is {}&quot;.format(NOTEBOOK_LOCATION))
  22. print(CLUSTER_ID)
  23. body_json = &quot;&quot;&quot;
  24. {
  25. &quot;name&quot;: &quot;A sample job to trigger from DevOps&quot;,
  26. &quot;tasks&quot;: [
  27. {
  28. &quot;task_key&quot;: &quot;ExecuteNotebook&quot;,
  29. &quot;description&quot;: &quot;Execute uploaded notebook including tests&quot;,
  30. &quot;depends_on&quot;: [],
  31. &quot;existing_cluster_id&quot;: &quot;&quot;&quot; + CLUSTER_ID + &quot;&quot;&quot;,
  32. &quot;notebook_task&quot;: {
  33. &quot;notebook_path&quot;: &quot;&quot;&quot; + NOTEBOOK_LOCATION + &quot;&quot;&quot;,
  34. &quot;base_parameters&quot;: {}
  35. },
  36. &quot;timeout_seconds&quot;: 300,
  37. &quot;max_retries&quot;: 1,
  38. &quot;min_retry_interval_millis&quot;: 5000,
  39. &quot;retry_on_timeout&quot;: false
  40. }
  41. ],
  42. &quot;email_notifications&quot;: {},
  43. &quot;name&quot;: &quot;Run_Unit_Tests&quot;,
  44. &quot;max_concurrent_runs&quot;: 1}
  45. &quot;&quot;&quot;
  46. print(&quot;Request body in json format:&quot;)
  47. print(body_json)
  48. response = requests.post(dbrks_create_job_url, headers=DBRKS_REQ_HEADERS, data=body_json)
  49. if response.status_code == 200:
  50. print(&quot;Job created successfully!&quot;)
  51. print(response.status_code)
  52. print(response.content)
  53. print(&quot;Job Id = {}&quot;.format(response.json()[&#39;job_id&#39;]))
  54. print(&quot;##vso[task.setvariable variable=DBRKS_JOB_ID;isOutput=true;]{b}&quot;.format(b=response.json()[&#39;job_id&#39;]))
  55. else:
  56. print(&quot;job failed!&quot;)
  57. raise Exception(response.content)
  58. [1]: https://i.stack.imgur.com/kBBPU.png
  59. </details>
  60. # 答案1
  61. **得分**: 1
  62. 根据@Alex Ott提到的,您需要使用`pyhton_wheel_task`而不是`notebook_task`
  63. 基于[Job API 2.1文档](https://redocly.github.io/redoc/?url=https://learn.microsoft.com/azure/databricks/_extras/api-refs/jobs-2.1-azure.yaml)。
  64. <details>
  65. <summary>英文:</summary>
  66. As already mentioned by @Alex Ott, instead of using the `notebook_task` you need to use `pyhton_wheel_task`.
  67. Based on the [Job API 2.1 docs](https://redocly.github.io/redoc/?url=https://learn.microsoft.com/azure/databricks/_extras/api-refs/jobs-2.1-azure.yaml).
  68. </details>
  69. # 答案2
  70. **得分**: 0
  71. 只需使用REST API文档中描述的`python_wheel_task`,而不是`notebook_task`。此外,您需要在JSON对象中提供`package_name``entry_point`参数。
  72. 不要忘记在`libraries`块中添加wheel文件。
  73. <details>
  74. <summary>英文:</summary>
  75. it&#39;s simple - instead of `notebook_task` you just need to use `python_wheel_task` as it&#39;s described in the REST API docs. And you need to provide `package_name` and `entry_point` parameters inside the JSON object.
  76. And don&#39;t forget to add the wheel file in the `libraries` block.
  77. </details>

huangapple
  • 本文由 发表于 2023年2月27日 19:13:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/75579739.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定