控制器未收到有关最终作业状态的通知。

huangapple go评论72阅读模式
英文:

Controller not notified about final Job status

问题

我正在构建一个生成作业(batch/v1)的应用程序,我需要使用作业状态更新我的自定义资源状态。

我使用以下代码设置控制器:

func (r *JobsManagedByRequestedBackupActionObserver) SetupWithManager(mgr ctrl.Manager) error {
	return ctrl.NewControllerManagedBy(mgr).
		For(&riotkitorgv1alpha1.RequestedBackupAction{}).
		Owns(&batchv1.Job{}).
		Owns(&batchv1.CronJob{}).
		WithEventFilter(predicate.Funcs{
		DeleteFunc: func(e event.DeleteEvent) bool {
			return false
		},
	}).
	Complete(r)
}

Reconcile(ctx context.Context, req ctrl.Request) 中,我根据 "req" 获取我的 RequestedBackupAction 对象,然后使用专用的跟踪标签从 API 中获取作业。

list, err := kj.client.Jobs(namespace).List(ctx, metav1.ListOptions{LabelSelector: v1alpha1.LabelTrackingId + "=" + trackingId})

当我遍历对象时:

for _, job := range list.Items {
	logrus.Errorf("[++++++++++++] JOB name=%s, failed=%v, active=%v, succeeded=%v", job.Name, job.Status.Failed, job.Status.Active, job.Status.Succeeded)
}

然后我得到多个类似这样的条目:

time="2022-12-12T20:00:55Z" level=error msg="[++++++++++++] JOB name=app1-backup-vmqrp, failed=0, active=1, succeeded=0"

但是最后我没有得到一个条目,应该是:failed=1, active=0, succeeded=0,即使作业实际上已经完成 - 关键是控制器没有被通知。

这是最终的作业状态:

status:
  conditions:
  - lastProbeTime: "2022-12-12T20:00:56Z"
    lastTransitionTime: "2022-12-12T20:00:56Z"
    message: Job has reached the specified backoff limit
    reason: BackoffLimitExceeded
    status: "True"
    type: Failed
  failed: 1
  ready: 0
  startTime: "2022-12-12T20:00:50Z"
  uncountedTerminatedPods: {}

可能出了什么问题?

英文:

I'm building an app that spawns Jobs (batch/v1), I need to update my Custom Resource status with the Job status.

I setup the controller with the following:

func (r *JobsManagedByRequestedBackupActionObserver) SetupWithManager(mgr ctrl.Manager) error {
	return ctrl.NewControllerManagedBy(mgr).
		For(&riotkitorgv1alpha1.RequestedBackupAction{}).
		Owns(&batchv1.Job{}).
		Owns(&batchv1.CronJob{}).
		WithEventFilter(predicate.Funcs{
			DeleteFunc: func(e event.DeleteEvent) bool {
				return false
			},
		}).
		Complete(r)
}

During the Reconcile(ctx context.Context, req ctrl.Request) I fetch my RequestedBackupAction object (basing on "req") and then I fetch Jobs from API using a dedicated tracking label.

list, err := kj.client.Jobs(namespace).List(ctx, metav1.ListOptions{LabelSelector: v1alpha1.LabelTrackingId + "=" + trackingId})

When I iterate over objects with:

for _, job := range list.Items {
		logrus.Errorf("[++++++++++++] JOB name=%s, failed=%v, active=%v, succeeded=%v", job.Name, job.Status.Failed, job.Status.Active, job.Status.Succeeded)
}

Then I get multiple entries like this:

time="2022-12-12T20:00:55Z" level=error msg="[++++++++++++] JOB name=app1-backup-vmqrp, failed=0, active=1, succeeded=0"

But I don't finally get an entry, where there should be: failed=1, active=0, succeeded=0
even if the Job actually finished - the point is that the controller is not being notified.

That's the final Job status:

  status:
    conditions:
    - lastProbeTime: "2022-12-12T20:00:56Z"
      lastTransitionTime: "2022-12-12T20:00:56Z"
      message: Job has reached the specified backoff limit
      reason: BackoffLimitExceeded
      status: "True"
      type: Failed
    failed: 1
    ready: 0
    startTime: "2022-12-12T20:00:50Z"
    uncountedTerminatedPods: {}

What could be wrong?

答案1

得分: 1

解决方案非常简单-当对象尚未准备好时,重新排队它,对于作业来说,这意味着等待它完成。但我仍然不明白为什么控制器没有收到从active=1到active=0和从failed=0到failed=1的状态更改的通知。

示例代码:

if healthStatus.Running {
	return ctrl.Result{Requeue: true}, nil
}
英文:

The solution was really dead simple - when the object is not ready, then requeue it, wich for Job means to wait until it will be finished. Still I don't understand why the controller is not notified about a state change from: active=1 to active=0 and from failed=0 to failed=1

Example:

if healthStatus.Running {
	return ctrl.Result{Requeue: true}, nil
}

答案2

得分: 0

你在创建之前是否设置了所拥有资源的控制器引用?

函数SetControllerReference会使用父资源的引用填充你所拥有的资源的metadata.ownerReferences字段。如果没有设置所有者引用,父资源将无法通过所拥有的资源的更改而触发。

// instance 是你的自定义资源
// job 是应该由 instance 拥有的资源

err = ctrl.SetControllerReference(instance, job, r.Scheme)
if err != nil {
	return err
}

err = r.Create(ctx, job)
if err != nil {
	return err
}
英文:

Did you set the controller reference of your owned resources before creating?

The function SetControllerReference populates metadata.ownerReferences field of your owner resource with the parent resource's reference. Without setting owner reference, parent resource cannot be triggered by owned resource's change.

// instance is your custom resource
// job is the resource that supposed to be owned by instance

err = ctrl.SetControllerReference(instance, job, r.Scheme)
if err != nil {
	return err
}

err = r.Create(ctx, job)
if err != nil {
	return err
}

huangapple
  • 本文由 发表于 2022年12月13日 04:30:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/74776966.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定