如何运行/重新运行 AWS Glue 工作流中的一部分作业?

huangapple go评论58阅读模式
英文:

How to run/re-run subset of jobs AWS Glue workflow?

问题

我正在构建一个由长时间运行的作业组成的AWS Glue工作流,其中许多作业可能会失败。在失败后,我是否可以重新运行工作流中的特定分支?

例如,我的工作流类似于以下内容:

<启动触发器> -> [作业1] -> [作业2] -> [作业4]
       ↳ [作业4]

假设 [作业1][作业4] 各自需要3小时,并且都成功完成。然后 [作业2] 被触发但失败,导致我的工作流处于以下状态:

<启动触发器> -> [作业1 ✔] -> [作业2 ✗] -> [作业4]
       ↳ [作业4 ✔]

我进行了更改,修复了 [作业2],并相信在重新运行时它将成功运行。我希望能够只重新运行工作流的 [作业2] -> [作业4] 分支,因为所有其他父作业都已成功完成。

在AWS Glue中是否有办法实现这一点?我正在考虑尝试构建一个由Glue作业组成的AWS Step Functions工作流,因为Step Functions工作流似乎具有此功能

英文:

I am building an AWS Glue workflow composed of long-running jobs many of which are subject to failure. Is there any way I can re-run a specific branch in my workflow after a failure?

For example, my workflow looks something like this:

&lt;Start Trigger&gt; -&gt; [Job 1] -&gt; [Job 2] -&gt; [Job 4]
       ↳ [Job 4]

Let's say [Job 1] and [Job 4] each take 3 hours and both complete successfully. Then [Job 2] is triggered but fails, leaving my workflow in this state:

&lt;Start Trigger&gt; -&gt; [Job 1 ✔] -&gt; [Job 2 ✗] -&gt; [Job 4]
       ↳ [Job 4 ✔]

I make a change which fixes [Job 2] and believe it will run successfully when re-run. I'd like to be able to re-run only the [Job 2] -&gt; [Job 4] branch of the workflow since all other parent jobs have completed successfully.

Is there anyway this can be done in AWS Glue? I'm considering trying to build an AWS Step Functions workflow of glue jobs as Step Functions workflows seem to have this functionality.

答案1

得分: 1

这项能力自2020年8月起已可用。

https://docs.aws.amazon.com/glue/latest/dg/resuming-workflow.html

英文:

The ability to do this is now available since August 2020.

https://docs.aws.amazon.com/glue/latest/dg/resuming-workflow.html

huangapple
  • 本文由 发表于 2020年1月3日 18:28:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/59576932.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定