如何从另一个Databricks作业触发Databricks作业?

huangapple go评论94阅读模式
英文:

How to to trigger a Databricks job from another Databricks job?

问题

我目前正在一个项目上工作,在这个项目中,我在Databricks上有两个不同的任务。第二个任务依赖于第一个任务的结果。

我想知道是否有一种方法可以在第一个任务成功完成后自动触发第二个任务。理想情况下,我希望能够在Databricks内部直接实现这一点,而不需要外部调度或编排工具。是否有人能够实现这种类型的设置或知道是否可能?

英文:

I'm currently working on a project where I have two distinct jobs on Databricks. The second job is dependent on the results of the first one.

I am wondering if there is a way to automatically trigger the second job once the first one has completed successfully. Ideally, I would like to accomplish this directly within Databricks without the need for an external scheduling or orchestration tool. Has anyone been able to implement this type of setup or know if it's possible?

答案1

得分: 2

可以使用Databricks REST API启动工作流。请参阅文档:https://docs.databricks.com/api/azure/workspace/jobs/runnow

您还可以将两个工作流的所有任务合并到一个工作流中。

英文:

It's possible to start a workflow using Databricks REST API.
See documentation here: https://docs.databricks.com/api/azure/workspace/jobs/runnow

You can also simply enter all the tasks from the 2 workflows into 1 workflow

答案2

得分: 1

以下是翻译好的内容:

使用Databricks SDK可以以编程方式运行作业:

import os
import time

from databricks.sdk import WorkspaceClient
from databricks.sdk.service import jobs

w = WorkspaceClient()

notebook_path = f"/Users/user1/notebook2"

cluster_id = (
    w.clusters.ensure_cluster_is_running(os.environ["DATABRICKS_CLUSTER_ID"])
    and os.environ["DATABRICKS_CLUSTER_ID"]
)

run = w.jobs.submit(
    run_name=f"sdk-{time.time_ns()}",
    tasks=[
        jobs.SubmitTask(
            existing_cluster_id=cluster_id,
            notebook_task=jobs.NotebookTask(notebook_path=notebook_path),
            task_key=f"sdk-{time.time_ns()}",
        )
    ],
).result()

更多详情请查阅:https://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs.html#JobsAPI.run_now

英文:

It is also possible to run the job programmatically using Databricks SDK:


import os
import time

from databricks.sdk import WorkspaceClient
from databricks.sdk.service import jobs

w = WorkspaceClient()

notebook_path = f"/Users/user1/notebook2"

cluster_id = (
    w.clusters.ensure_cluster_is_running(os.environ["DATABRICKS_CLUSTER_ID"])
    and os.environ["DATABRICKS_CLUSTER_ID"]
)

run = w.jobs.submit(
    run_name=f"sdk-{time.time_ns()}",
    tasks=[
        jobs.SubmitTask(
            existing_cluster_id=cluster_id,
            notebook_task=jobs.NotebookTask(notebook_path=notebook_path),
            task_key=f"sdk-{time.time_ns()}",
        )
    ],
).result()


More details: https://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs.html#JobsAPI.run_now

答案3

得分: 1

Databricks现在正在推出新功能,称为“作业作为任务”,允许在工作流中触发另一个作业作为任务。文档尚未更新,但您可以在UI中看到它。

  • 在添加新任务时选择“运行作业”:

如何从另一个Databricks作业触发Databricks作业?

  • 选择要执行的特定作业作为任务:

如何从另一个Databricks作业触发Databricks作业?

英文:

Databricks is now rolling out the new functionality, called "Job as a Task" that allows to trigger another job as a task in a workflow. Documentation isn't updated yet, but you may see it in the UI.

  • Select "Run Job" when adding a new task:

如何从另一个Databricks作业触发Databricks作业?

  • Select specific job to execute as a task:

如何从另一个Databricks作业触发Databricks作业?

huangapple
  • 本文由 发表于 2023年7月31日 20:22:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76803583.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定