英文:
How to create Azure Databricks jobs of type python wheel by terraform
问题
我正在使用Terraform在Azure中实现Databricks作业。我有一个Python Wheel需要在这个作业中执行。根据Terraform Azure Databricks文档中的链接,我知道如何实现Databricks笔记本作业。然而,我需要的是一个类型为“Python Wheel”的Databricks作业。在提到的链接中提供的所有示例都是创建Databricks作业的“notebook_task”、“spark_jar_task”或“pipeline_task”类型。没有一个是我确切需要的。然而,如果您查看Databricks工作区,您会看到有一个特定类型为“Python Wheel”的作业。在下面的工作区截图中您可以看到这一点:
为了更详细地说明,根据文档,我已经创建了一个作业。以下是我的main.tf
文件:
resource "databricks_notebook" "this" {
path = "/Users/myusername/${var.notebook_subdirectory}/${var.notebook_filename}"
language = var.notebook_language
source = "./${var.notebook_filename}"
}
resource "databricks_job" "sample-tf-job" {
name = var.job_name
existing_cluster_id = "0342-285291-x0vbdshv" ## databricks_cluster.this.cluster_id
notebook_task {
notebook_path = databricks_notebook.this.path
}
}
**正如我所说,这个作业是“Notebook”类型,这也在截图中显示。我需要的作业是“Python Wheel”类型。**
我相当确定Terraform已经提供了创建“Python Wheel”作业的能力,因为我在[用于Databricks的Terraform提供者](https://github.com/databricks/terraform-provider-databricks/blob/master/jobs/resource_job.go)的源代码中看到了当前第49行定义了Python Wheel任务。然而,我不清楚如何在我的代码中调用它。以下是我正在参考的源代码:
```go
// PythonWheelTask 包含Python Wheel作业的信息
type PythonWheelTask struct {
EntryPoint string `json:"entry_point,omitempty"`
PackageName string `json:"package_name,omitempty"`
Parameters []string `json:"parameters,omitempty"`
NamedParameters map[string]string `json:"named_parameters,omitempty"`
}
英文:
I am using terraform to implement a databricks job in Azure. I have a python wheel that I need to execute in this job. Following terraform Azure databricks documentation at this link I know how to implement a databricks notebook job. However, what I need is a databricks job of type "python wheel". All the examples provided in the mentioned link are to create databricks jobs of either type "notebook_task", or "spark_jar_task" or "pipeline_task". None of them is what I exactly need. If you look into databricks workspace however, you can see there is a specific job of type "python wheel". Below you can see this in the workspace:
Just to elaborate more, according to documentation I have already created a job. Following is my main.tf file:
resource "databricks_notebook" "this" {
path = "/Users/myusername/${var.notebook_subdirectory}/${var.notebook_filename}"
language = var.notebook_language
source = "./${var.notebook_filename}"
}
resource "databricks_job" "sample-tf-job" {
name = var.job_name
existing_cluster_id = "0342-285291-x0vbdshv" ## databricks_cluster.this.cluster_id
notebook_task {
notebook_path = databricks_notebook.this.path
}
}
As I said, this job is of type "Notebook" which is also in screen shot. The job I need is of type "Python wheel".
I am pretty sure terraform has already provided the capability to create "Python wheel" jobs as by looking at the source code in terraform provider for databricks I can see at currently line 49 python wheel task is defined. However, it is not clear to me how to call it in my code. Below is the source code I am referring to:
// PythonWheelTask contains the information for python wheel jobs
type PythonWheelTask struct {
EntryPoint string `json:"entry_point,omitempty"`
PackageName string `json:"package_name,omitempty"`
Parameters []string `json:"parameters,omitempty"`
NamedParameters map[string]string `json:"named_parameters,omitempty"`
}
答案1
得分: 1
代替使用notebook_task
,你只需使用python_wheel_task
配置块,如提供者文档中所述。类似这样:
resource "databricks_job" "sample-tf-job" {
name = var.job_name
task {
task_key = "a"
existing_cluster_id = "0342-285291-x0vbdshv" ## databricks_cluster.this.cluster_id
python_wheel_task {
package_name = "my_package"
entry_point = "entry_point"
}
library {
whl = "dbfs:/FileStore/baz.whl"
}
}
}
P.S. 最好不要使用交互式集群,因为它更昂贵。
英文:
Instead of notebook_task
you just need to use the python_wheel_task
configuration block, as described in the provider documentation. Something like this:
resource "databricks_job" "sample-tf-job" {
name = var.job_name
task {
task_key = "a"
existing_cluster_id = "0342-285291-x0vbdshv" ## databricks_cluster.this.cluster_id
python_wheel_task {
package_name = "my_package"
entry_point = "entry_point"
}
library {
whl = "dbfs:/FileStore/baz.whl"
}
}
}
P.S. It's better not to use interactive clusters as it's more expensive
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论