2023年7月31日 20:22:42go评论128阅读模式

英文:

How to to trigger a Databricks job from another Databricks job?

问题

我目前正在一个项目上工作，在这个项目中，我在Databricks上有两个不同的任务。第二个任务依赖于第一个任务的结果。

我想知道是否有一种方法可以在第一个任务成功完成后自动触发第二个任务。理想情况下，我希望能够在Databricks内部直接实现这一点，而不需要外部调度或编排工具。是否有人能够实现这种类型的设置或知道是否可能？

英文:

I'm currently working on a project where I have two distinct jobs on Databricks. The second job is dependent on the results of the first one.

I am wondering if there is a way to automatically trigger the second job once the first one has completed successfully. Ideally, I would like to accomplish this directly within Databricks without the need for an external scheduling or orchestration tool. Has anyone been able to implement this type of setup or know if it's possible?

答案1

得分: 2

可以使用Databricks REST API启动工作流。请参阅文档：https://docs.databricks.com/api/azure/workspace/jobs/runnow

您还可以将两个工作流的所有任务合并到一个工作流中。

英文:

It's possible to start a workflow using Databricks REST API.
See documentation here: https://docs.databricks.com/api/azure/workspace/jobs/runnow

You can also simply enter all the tasks from the 2 workflows into 1 workflow

答案2

得分: 1

以下是翻译好的内容：

使用Databricks SDK可以以编程方式运行作业：

import os
import time
from databricks.sdk import WorkspaceClient
from databricks.sdk.service import jobs
w = WorkspaceClient()
notebook_path = f"/Users/user1/notebook2"
cluster_id = (
    w.clusters.ensure_cluster_is_running(os.environ["DATABRICKS_CLUSTER_ID"])
    and os.environ["DATABRICKS_CLUSTER_ID"]
)
run = w.jobs.submit(
    run_name=f"sdk-{time.time_ns()}",
    tasks=[
        jobs.SubmitTask(
            existing_cluster_id=cluster_id,
            notebook_task=jobs.NotebookTask(notebook_path=notebook_path),
            task_key=f"sdk-{time.time_ns()}",
        )
    ],
).result()

更多详情请查阅：https://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs.html#JobsAPI.run_now

英文:

It is also possible to run the job programmatically using Databricks SDK:


import os
import time
from databricks.sdk import WorkspaceClient
from databricks.sdk.service import jobs
w = WorkspaceClient()
notebook_path = f&quot;/Users/user1/notebook2&quot;
cluster_id = (
    w.clusters.ensure_cluster_is_running(os.environ[&quot;DATABRICKS_CLUSTER_ID&quot;])
    and os.environ[&quot;DATABRICKS_CLUSTER_ID&quot;]
)
run = w.jobs.submit(
    run_name=f&quot;sdk-{time.time_ns()}&quot;,
    tasks=[
        jobs.SubmitTask(
            existing_cluster_id=cluster_id,
            notebook_task=jobs.NotebookTask(notebook_path=notebook_path),
            task_key=f&quot;sdk-{time.time_ns()}&quot;,
        )
    ],
).result()

More details: https://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs.html#JobsAPI.run_now

答案3

得分: 1

Databricks现在正在推出新功能，称为“作业作为任务”，允许在工作流中触发另一个作业作为任务。文档尚未更新，但您可以在UI中看到它。

在添加新任务时选择“运行作业”：

选择要执行的特定作业作为任务：

英文:

Databricks is now rolling out the new functionality, called "Job as a Task" that allows to trigger another job as a task in a workflow. Documentation isn't updated yet, but you may see it in the UI.

Select "Run Job" when adding a new task:

Select specific job to execute as a task:

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何从另一个Databricks作业触发Databricks作业？

问题

答案1

答案2

答案3

如何正确迭代Big Query TableResult？

Spark如何将两个数组列合并而不去除重复项

模式在Spark中是如何推断的？

Snowpark表格创建失败，即使查询成功执行。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。