如何使用Terraform创建Azure Databricks类型为Python Wheel的作业。

huangapple go评论50阅读模式
英文:

How to create Azure Databricks jobs of type python wheel by terraform

问题

我正在使用Terraform在Azure中实现Databricks作业。我有一个Python Wheel需要在这个作业中执行。根据Terraform Azure Databricks文档中的链接,我知道如何实现Databricks笔记本作业。然而,我需要的是一个类型为“Python Wheel”的Databricks作业。在提到的链接中提供的所有示例都是创建Databricks作业的“notebook_task”、“spark_jar_task”或“pipeline_task”类型。没有一个是我确切需要的。然而,如果您查看Databricks工作区,您会看到有一个特定类型为“Python Wheel”的作业。在下面的工作区截图中您可以看到这一点:

如何使用Terraform创建Azure Databricks类型为Python Wheel的作业。

为了更详细地说明,根据文档,我已经创建了一个作业。以下是我的main.tf文件:

resource "databricks_notebook" "this" {
  path     = "/Users/myusername/${var.notebook_subdirectory}/${var.notebook_filename}"
  language = var.notebook_language
  source   = "./${var.notebook_filename}"
}

resource "databricks_job" "sample-tf-job" {
  name = var.job_name
  existing_cluster_id = "0342-285291-x0vbdshv"  ## databricks_cluster.this.cluster_id
  notebook_task {
    notebook_path = databricks_notebook.this.path 
  }
} 

**正如我所说这个作业是Notebook类型这也在截图中显示我需要的作业是Python Wheel类型。**

我相当确定Terraform已经提供了创建Python Wheel作业的能力因为我在[用于Databricks的Terraform提供者](https://github.com/databricks/terraform-provider-databricks/blob/master/jobs/resource_job.go)的源代码中看到了当前第49行定义了Python Wheel任务然而我不清楚如何在我的代码中调用它以下是我正在参考的源代码

```go
// PythonWheelTask 包含Python Wheel作业的信息
type PythonWheelTask struct {
    EntryPoint      string            `json:"entry_point,omitempty"`
    PackageName     string            `json:"package_name,omitempty"`
    Parameters      []string          `json:"parameters,omitempty"`
    NamedParameters map[string]string `json:"named_parameters,omitempty"`
}
英文:

I am using terraform to implement a databricks job in Azure. I have a python wheel that I need to execute in this job. Following terraform Azure databricks documentation at this link I know how to implement a databricks notebook job. However, what I need is a databricks job of type "python wheel". All the examples provided in the mentioned link are to create databricks jobs of either type "notebook_task", or "spark_jar_task" or "pipeline_task". None of them is what I exactly need. If you look into databricks workspace however, you can see there is a specific job of type "python wheel". Below you can see this in the workspace:

如何使用Terraform创建Azure Databricks类型为Python Wheel的作业。

Just to elaborate more, according to documentation I have already created a job. Following is my main.tf file:

resource "databricks_notebook" "this" {
  path     = "/Users/myusername/${var.notebook_subdirectory}/${var.notebook_filename}"
  language = var.notebook_language
  source   = "./${var.notebook_filename}"
}

resource "databricks_job" "sample-tf-job" {
  name = var.job_name
  existing_cluster_id = "0342-285291-x0vbdshv"  ## databricks_cluster.this.cluster_id
  notebook_task {
    notebook_path = databricks_notebook.this.path 
  }
} 

As I said, this job is of type "Notebook" which is also in screen shot. The job I need is of type "Python wheel".

I am pretty sure terraform has already provided the capability to create "Python wheel" jobs as by looking at the source code in terraform provider for databricks I can see at currently line 49 python wheel task is defined. However, it is not clear to me how to call it in my code. Below is the source code I am referring to:

// PythonWheelTask contains the information for python wheel jobs
type PythonWheelTask struct {
	EntryPoint      string            `json:"entry_point,omitempty"`
    PackageName     string            `json:"package_name,omitempty"`
    Parameters      []string          `json:"parameters,omitempty"`
    NamedParameters map[string]string `json:"named_parameters,omitempty"`
}

答案1

得分: 1

代替使用notebook_task,你只需使用python_wheel_task配置块,如提供者文档中所述。类似这样:

resource "databricks_job" "sample-tf-job" {
  name = var.job_name

  task {
    task_key = "a"
    existing_cluster_id = "0342-285291-x0vbdshv"  ## databricks_cluster.this.cluster_id
    python_wheel_task {
      package_name = "my_package"
      entry_point = "entry_point"
    }
    library {
      whl = "dbfs:/FileStore/baz.whl"
    } 
  }
} 

P.S. 最好不要使用交互式集群,因为它更昂贵。

英文:

Instead of notebook_task you just need to use the python_wheel_task configuration block, as described in the provider documentation. Something like this:

resource "databricks_job" "sample-tf-job" {
  name = var.job_name

  task {
    task_key = "a"
    existing_cluster_id = "0342-285291-x0vbdshv"  ## databricks_cluster.this.cluster_id
    python_wheel_task {
      package_name = "my_package"
      entry_point = "entry_point"
    }
    library {
      whl = "dbfs:/FileStore/baz.whl"
    } 
  }
} 

P.S. It's better not to use interactive clusters as it's more expensive

huangapple
  • 本文由 发表于 2023年2月27日 18:46:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/75579462.html
  • azure-databricks
  • databricks
  • databricks-workflows
  • terraform
  • terraform-provider-databricks
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定