Cadence工作流/活动的自动心跳

huangapple go评论95阅读模式
英文:

Automatic heart-beat of Cadence workflow/activity

问题

  1. 我们已经使用自动心跳配置 EnableAutoHeartBeat: true 注册了这些活动,并在活动实现中配置了活动选项 HeartbeatTimeout: 15Min

  2. 我们是否仍然需要使用 activity.heartbeat() 明确发送心跳,还是由 Go 客户端库自动处理?

  3. 如果它是自动的,那么如果活动正在等待外部 API 响应,延迟超过 15 分钟会发生什么?

  4. 如果执行活动的工作程序崩溃或被终止,活动心跳会发生什么情况?

  5. Cadence 是否会因心跳失败而重试活动?

英文:

We have registered the activities with auto-heartbeat configuration EnableAutoHeartBeat: true and also configured the activity option config HeartbeatTimeout: 15Min in the activity implementation.

  1. Do we still need to explicitly send heart-beat using activity.heartbeat() or is it automatically taken care by the go-client library?
  2. If its automatic, then what will happen if the Activity is waiting for external API response say >15Min delay?
  3. What will happen during the activity heart-beat if the worker executing the activity crashes or killed?
  4. Will Cadence retry the activities due to heart-beat failures?

答案1

得分: 1

  1. 不需要,SDK会使用这个配置来处理它。

  2. 自动心跳将在每个间隔发送心跳 - 间隔为心跳超时的80%(在您的情况下为15分钟),这样只要活动工作器仍然存在,活动就不会超时。

    因此,您应该使用较小的心跳超时,理想情况下为10~20秒。

  3. 活动将因“心跳超时”而失败。

  4. 是的,如果您设置了重试策略。

    有关重试策略,请参见我的其他回答。

    https://stackoverflow.com/questions/65139178/how-to-set-proper-timeout-values-for-cadence-activitieslocal-and-regular-activi

例子

假设您的活动实现正在等待AWS SDK API的2小时(已配置的最大API超时) -

您仍然应该使用10-20秒的心跳超时,并且还应该使用2小时的活动启动关闭超时。

心跳超时用于检测主机不再处于活动状态,以便尽早重新启动活动。

想象这种情况:

由于API需要2小时,活动工作器在这2小时内重新启动。

如果HB超时为15分钟,那么Cadence将在15分钟后重试此活动。

如果HB超时为10秒,那么Cadence可以在10秒后重试它,因为它将在10秒内超时。

英文:
  1. No, the SDK will take care of it with this config.

  2. The auto heartbeat will send heartbeat for every interval — the interval is 80% * Heartbeat timeout(15 minutes in your case) so that the activity won’t get timeout as long as the activity worker is still live.

So you should use a smaller heatbeat timeout, ideally 10~20s is the best.

  1. The activity will fail with “heartbeat timeout “

  2. Yes if you have set a retry policy .

See my other answer for retry policy

https://stackoverflow.com/questions/65139178/how-to-set-proper-timeout-values-for-cadence-activitieslocal-and-regular-activi

Example

Let say your activity implementation is waiting on AWS SDK API for 2 hours (max API timeout configured) --

You should still use 10-20 s for heartbeat timeout, and also use 2 hours for activity start to close timeout.
Heartbeat timeout is for detecting the host is not live anymore so that the activity can be restarted as early as possible.
Imagine this case:
Because the API takes 2 hours, the activity worker got restarted during the 2 hours.
If the HB timeout is 15 minutes, then Cadence will retry this activity after 15 minutes.
If HB timeout is 10s, then Cadence can retry it after 10s, because it will get HB timeout within 10 seconds.

huangapple
  • 本文由 发表于 2023年2月16日 09:07:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75466883.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定