2023年2月7日 02:08:54go评论115阅读模式

英文:

Can we set task wise parameters using Databricks Jobs API "run-now"

问题

我有一个包括多个任务的工作，例如Task1 -> Task2。我试图使用API“立即运行”来调用该工作。任务详细信息如下：

Task1 - 它执行一个带有一些输入参数的笔记本

Task2 - 它执行一个带有一些输入参数的笔记本

那么，我如何在使用“立即运行”命令调用task1、task2的作业API时提供参数？

我有一个参数“lib”，需要分别在任务中具有值'pandas'和'spark'。

我知道我们可以为每个任务分配唯一的参数名称，如Task1_lib，Task2_lib，并以这种方式读取。

当前方式：
json = {"job_id" : 3234234, "notebook_params":{Task1_lib: a, Task2_lib: b}}

是否有一种方法可以发送任务特定的参数？

英文:

I have a job with multiple tasks like Task1 -> Task2. I am trying to call the job using api "run now". Task details are below

Task1 - It executes a Note Book with some input parameters

Task2 - It executes a Note Book with some input parameters

So, how I can provide parameters to job api using "run now" command for task1,task2?

I have a parameter "lib" which needs to have values 'pandas' and 'spark' task wise.

I know that we can give unique parameter names like Task1_lib, Task2_lib and read that way.

current way:
json = {"job_id" : 3234234, "notebook_params":{Task1_lib: a, Task2_lib: b}}

Is there a way to send task wise parameters?

答案1

得分: 1

以下是代码部分的中文翻译：

import requests

token = TOKEN  # 请替换为您的访问令牌
workspace = WORKSPACE  # 请替换为您的工作空间
job_id = JOB_ID  # 请替换为您的作业ID
headers = {"Authorization": f"Bearer {token}"}
url = f"{workspace}/api/2.0/jobs/update"

tasks = [{
    "task_key": "task_1",
    "notebook_task": {
        "notebook_path": "/path/to/notebook",  # 请替换为您的笔记本路径
        "base_parameters": {
            "param_1": "新参数值1",  # 请替换为新的参数值
            "param_2": "新参数值2",  # 请替换为新的参数值
        },
        "source": "WORKSPACE",
    },
    "job_cluster_key": "Job_cluster",  # 请替换为作业集群的键
    "timeout_seconds": 0,  # 可选的超时时间（以秒为单位）
},
{
    "task_key": "task_2",
    "notebook_task": {
        "notebook_path": "/path/to/notebook",  # 请替换为您的笔记本路径
        "base_parameters": {
            "param_1": "新参数值1",  # 请替换为新的参数值
            "param_2": "新参数值2",  # 请替换为新的参数值
        },
        "source": "WORKSPACE",
    },
    "job_cluster_key": "Job_cluster",  # 请替换为作业集群的键
    "timeout_seconds": 0,  # 可选的超时时间（以秒为单位）
}]

json = {
    "job_id": job_id,
    "new_settings" : {"tasks": tasks}
}

resp = requests.post(url=url, headers=headers, json=json)

希望这对您有所帮助。如果您有其他问题，请随时提出。

英文:

A bit of a hack but I managed to do it by first updating the tasks using the updates API and then running the job.

Here is a rough template of how in python:

import requests

token = TOKEN
workspace = WORKSPACE
job_id = JOB_ID
headers = {&quot;Authorization&quot;: f&quot;Bearer {token}&quot;}
url = f&quot;{workspace}/api/2.0/jobs/update&quot;

tasks = [{
    &quot;task_key&quot;: &quot;task_1&quot;,
    &quot;notebook_task&quot;: {
        &quot;notebook_path&quot;: &quot;/path/to/notebook&quot;,
        &quot;base_parameters&quot;: {
            &quot;param_1&quot;: &quot;new param here&quot;,
            &quot;param_2&quot;:  &quot;new param here&quot;,
        },
        &quot;source&quot;: &quot;WORKSPACE&quot;,
    },
    &quot;job_cluster_key&quot;: &quot;Job_cluster&quot;,
    &quot;timeout_seconds&quot;: 0,
},
{
    &quot;task_key&quot;: &quot;task_2&quot;,
    &quot;notebook_task&quot;: {
        &quot;notebook_path&quot;: &quot;/path/to/notebook&quot;,
        &quot;base_parameters&quot;: {
            &quot;param_1&quot;: &quot;new param here&quot;,
            &quot;param_2&quot;:  &quot;new param here&quot;,
        },
        &quot;source&quot;: &quot;WORKSPACE&quot;,
    },
    &quot;job_cluster_key&quot;: &quot;Job_cluster&quot;,
    &quot;timeout_seconds&quot;: 0,
}]


json = {
    &quot;job_id&quot;: job_id,
    &quot;new_settings&quot; : {&quot;tasks&quot;: tasks}
}

resp = requests.post(url=url, headers=headers, json=json)

答案2

得分: 0

不支持目前 - 参数是在作业级别上定义的。您可以要求您的Databricks代表（如果有的话）将此请求传达给负责Databricks工作流的产品团队。

英文:

It's not supported right now - parameters are defined on the job level. You can ask your Databricks representative (if you have) to communicate this ask to the product team who works on the Databricks Workflows.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

可以使用Databricks Jobs API的”run-now”功能来设置任务相关的参数吗？

问题

答案1

答案2

如何在相同产品的标志为真时对一列进行求和

Saved delta file reads as an df – is it still part of delta lake?

How to Authenitcate Access to ADLS from Databricks without creating Service Principle with Using AD App Registrations to Mount a Drive

如何在Databricks中创建指向Markdown标题的链接？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论