英文:
How to to trigger a Databricks job from another Databricks job?
问题
我目前正在一个项目上工作,在这个项目中,我在Databricks上有两个不同的任务。第二个任务依赖于第一个任务的结果。
我想知道是否有一种方法可以在第一个任务成功完成后自动触发第二个任务。理想情况下,我希望能够在Databricks内部直接实现这一点,而不需要外部调度或编排工具。是否有人能够实现这种类型的设置或知道是否可能?
英文:
I'm currently working on a project where I have two distinct jobs on Databricks. The second job is dependent on the results of the first one.
I am wondering if there is a way to automatically trigger the second job once the first one has completed successfully. Ideally, I would like to accomplish this directly within Databricks without the need for an external scheduling or orchestration tool. Has anyone been able to implement this type of setup or know if it's possible?
答案1
得分: 2
可以使用Databricks REST API启动工作流。请参阅文档:https://docs.databricks.com/api/azure/workspace/jobs/runnow
您还可以将两个工作流的所有任务合并到一个工作流中。
英文:
It's possible to start a workflow using Databricks REST API.
See documentation here: https://docs.databricks.com/api/azure/workspace/jobs/runnow
You can also simply enter all the tasks from the 2 workflows into 1 workflow
答案2
得分: 1
以下是翻译好的内容:
使用Databricks SDK可以以编程方式运行作业:
import os
import time
from databricks.sdk import WorkspaceClient
from databricks.sdk.service import jobs
w = WorkspaceClient()
notebook_path = f"/Users/user1/notebook2"
cluster_id = (
w.clusters.ensure_cluster_is_running(os.environ["DATABRICKS_CLUSTER_ID"])
and os.environ["DATABRICKS_CLUSTER_ID"]
)
run = w.jobs.submit(
run_name=f"sdk-{time.time_ns()}",
tasks=[
jobs.SubmitTask(
existing_cluster_id=cluster_id,
notebook_task=jobs.NotebookTask(notebook_path=notebook_path),
task_key=f"sdk-{time.time_ns()}",
)
],
).result()
更多详情请查阅:https://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs.html#JobsAPI.run_now
英文:
It is also possible to run the job programmatically using Databricks SDK:
import os
import time
from databricks.sdk import WorkspaceClient
from databricks.sdk.service import jobs
w = WorkspaceClient()
notebook_path = f"/Users/user1/notebook2"
cluster_id = (
w.clusters.ensure_cluster_is_running(os.environ["DATABRICKS_CLUSTER_ID"])
and os.environ["DATABRICKS_CLUSTER_ID"]
)
run = w.jobs.submit(
run_name=f"sdk-{time.time_ns()}",
tasks=[
jobs.SubmitTask(
existing_cluster_id=cluster_id,
notebook_task=jobs.NotebookTask(notebook_path=notebook_path),
task_key=f"sdk-{time.time_ns()}",
)
],
).result()
More details: https://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs.html#JobsAPI.run_now
答案3
得分: 1
Databricks现在正在推出新功能,称为“作业作为任务”,允许在工作流中触发另一个作业作为任务。文档尚未更新,但您可以在UI中看到它。
- 在添加新任务时选择“运行作业”:
- 选择要执行的特定作业作为任务:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论