可以使用Databricks Jobs API的”run-now”功能来设置任务相关的参数吗?

huangapple go评论54阅读模式
英文:

Can we set task wise parameters using Databricks Jobs API "run-now"

问题

我有一个包括多个任务的工作,例如Task1 -> Task2。我试图使用API“立即运行”来调用该工作。任务详细信息如下:

Task1 - 它执行一个带有一些输入参数的笔记本

Task2 - 它执行一个带有一些输入参数的笔记本

那么,我如何在使用“立即运行”命令调用task1、task2的作业API时提供参数?

我有一个参数“lib”,需要分别在任务中具有值'pandas'和'spark'。

我知道我们可以为每个任务分配唯一的参数名称,如Task1_lib,Task2_lib,并以这种方式读取。

当前方式:
json = {"job_id" : 3234234, "notebook_params":{Task1_lib: a, Task2_lib: b}}

是否有一种方法可以发送任务特定的参数?

英文:

I have a job with multiple tasks like Task1 -> Task2. I am trying to call the job using api "run now". Task details are below

Task1 - It executes a Note Book with some input parameters

Task2 - It executes a Note Book with some input parameters

So, how I can provide parameters to job api using "run now" command for task1,task2?

I have a parameter "lib" which needs to have values 'pandas' and 'spark' task wise.

I know that we can give unique parameter names like Task1_lib, Task2_lib and read that way.

current way:
json = {"job_id" : 3234234, "notebook_params":{Task1_lib: a, Task2_lib: b}}

Is there a way to send task wise parameters?

答案1

得分: 1

以下是代码部分的中文翻译:

import requests

token = TOKEN  # 请替换为您的访问令牌
workspace = WORKSPACE  # 请替换为您的工作空间
job_id = JOB_ID  # 请替换为您的作业ID
headers = {"Authorization": f"Bearer {token}"}
url = f"{workspace}/api/2.0/jobs/update"

tasks = [{
    "task_key": "task_1",
    "notebook_task": {
        "notebook_path": "/path/to/notebook",  # 请替换为您的笔记本路径
        "base_parameters": {
            "param_1": "新参数值1",  # 请替换为新的参数值
            "param_2": "新参数值2",  # 请替换为新的参数值
        },
        "source": "WORKSPACE",
    },
    "job_cluster_key": "Job_cluster",  # 请替换为作业集群的键
    "timeout_seconds": 0,  # 可选的超时时间(以秒为单位)
},
{
    "task_key": "task_2",
    "notebook_task": {
        "notebook_path": "/path/to/notebook",  # 请替换为您的笔记本路径
        "base_parameters": {
            "param_1": "新参数值1",  # 请替换为新的参数值
            "param_2": "新参数值2",  # 请替换为新的参数值
        },
        "source": "WORKSPACE",
    },
    "job_cluster_key": "Job_cluster",  # 请替换为作业集群的键
    "timeout_seconds": 0,  # 可选的超时时间(以秒为单位)
}]

json = {
    "job_id": job_id,
    "new_settings" : {"tasks": tasks}
}

resp = requests.post(url=url, headers=headers, json=json)

希望这对您有所帮助。如果您有其他问题,请随时提出。

英文:

A bit of a hack but I managed to do it by first updating the tasks using the updates API and then running the job.

Here is a rough template of how in python:

import requests

token = TOKEN
workspace = WORKSPACE
job_id = JOB_ID
headers = {"Authorization": f"Bearer {token}"}
url = f"{workspace}/api/2.0/jobs/update"

tasks = [{
    "task_key": "task_1",
    "notebook_task": {
        "notebook_path": "/path/to/notebook",
        "base_parameters": {
            "param_1": "new param here",
            "param_2":  "new param here",
        },
        "source": "WORKSPACE",
    },
    "job_cluster_key": "Job_cluster",
    "timeout_seconds": 0,
},
{
    "task_key": "task_2",
    "notebook_task": {
        "notebook_path": "/path/to/notebook",
        "base_parameters": {
            "param_1": "new param here",
            "param_2":  "new param here",
        },
        "source": "WORKSPACE",
    },
    "job_cluster_key": "Job_cluster",
    "timeout_seconds": 0,
}]


json = {
    "job_id": job_id,
    "new_settings" : {"tasks": tasks}
}

resp = requests.post(url=url, headers=headers, json=json)

答案2

得分: 0

不支持目前 - 参数是在作业级别上定义的。您可以要求您的Databricks代表(如果有的话)将此请求传达给负责Databricks工作流的产品团队。

英文:

It's not supported right now - parameters are defined on the job level. You can ask your Databricks representative (if you have) to communicate this ask to the product team who works on the Databricks Workflows.

huangapple
  • 本文由 发表于 2023年2月7日 02:08:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/75365050.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定