K8s控制器更新状态和条件

huangapple go评论103阅读模式
英文:

K8s controller update status and condition

问题

我有一个需要安装一些资源并根据情况更新状态和条件的k8s控制器。

调和过程如下:

  1. 安装资源并不等待。
  2. 调用函数checkAvailability,根据情况更新状态,如果准备就绪/等待安装/错误。

我有两个主要问题:

  1. 这是我第一次使用状态和条件,这样做对吗?还是我漏掉了什么?
  2. 有时当我执行更新r.Status().Update时,会出现错误:Operation cannot be fulfilled on eds.core.vtw.bmw.com "resouce01": the object has been modified; please apply your changes to the latest version and try again,所以我添加了conditionChanged检查来解决这个问题,但不确定是否正确,因为我只更新一次状态,如果它没有改变,我就不会再次更新它,所以用户可能会看到很久以前的准备就绪状态,而调和过程不会更新准备就绪条件的日期和时间,因为当它已经是“准备就绪”时会跳过它。

我使用以下代码:

func (r *ebdReconciler) checkHealth(ctx context.Context, req ctrl.Request, ebd ebdmanv1alpha1.ebd) (bool, error) {
    vfmReady, err := r.mbr.IsReady(ctx, req.Name, req.Namespace)
    condition := metav1.Condition{
        Type:               ebdmanv1alpha1.KubernetesvfmHealthy,
        Observebdneration: ebd.Generation,
        LastTransitionTime: metav1.Now(),
    }
    if err != nil {
        // 检查可用性时出错 - 将状态设置为false
        condition.Status = metav1.ConditionFalse
        condition.Reason = ebdmanv1alpha1.ReasonError
        condition.Message = fmt.Sprintf("Failed to check vfm readiness: %v", err)
    } else if vfmReady {
        // vfm准备就绪 - 将状态设置为true
        condition.Status = metav1.ConditionTrue
        condition.Reason = ebdmanv1alpha1.ReasonReady
        condition.Message = "vfm custom resource is ready"
    } else {
        // vfm未准备就绪 - 将状态设置为false
        condition.Status = metav1.ConditionFalse
        condition.Reason = ebdmanv1alpha1.ResourceProgressing
        condition.Message = "vfm custom resource is not ready"
    }
    // 获取最新版本的ebd
    latestebd := ebdmanv1alpha1.ebd{}
    if err := r.Get(ctx, req.NamespacedName, &latestebd); err != nil {
        return vfmReady, err
    }

    oldConditions := latestebd.Status.Conditions
    meta.SetStatusCondition(&latestebd.Status.Conditions, condition)

    if !conditionChanged(&oldConditions, &latestebd.Status.Conditions, ebdmanv1alpha1.KubernetesvfmHealthy) {
        return vfmReady, nil
    }

    if err := r.Status().Update(ctx, &latestebd); err != nil {
        r.Log.Error(err, "failed to update vfm status")
        return vfmReady, err
    }
    return vfmReady, nil
}


func conditionChanged(oldConditions, newConditions *[]metav1.Condition, conditionType string) bool {
    newCondition := meta.FindStatusCondition(*newConditions, conditionType)
    oldCondition := meta.FindStatusCondition(*oldConditions, conditionType)
    if oldCondition == nil && newCondition == nil {
        return false
    }
    if oldCondition == nil || newCondition == nil {
        return true
    }
    return oldCondition.Status != newCondition.Status || oldCondition.Reason != newCondition.Reason || oldCondition.Message != newCondition.Message
}

希望这可以帮助到你!如果你有任何其他问题,请随时问我。

英文:

Im having k8s controller which needs to install some resources and update the status and condition accordantly

The flow in the reconcile is like following:

  1. Install the resource and don’t wait
  2. Call to the function checkAvailability and update the status accordantly if ready/ pending install/ error

I’ve two main questions:

  1. This is the first time that I use status and conditions, is it right way or do I miss something
  2. Sometimes when I do the update r.Status().Update I got error :Operation cannot be fulfilled on eds.core.vtw.bmw.com “resouce01”: the object has been modified; please apply your changes to the latest version and try again , so I’ve added the check conditionChanged` which solve the problem but not sure if its correct as I update the status once and if it doesn’t chanted I don’t touch it so user can see status ready from a while ago and the reconcile doesn’t update the date and time for the ready condition as it skip it when its already “ready”

I use the following

func (r *ebdReconciler) checkHealth(ctx context.Context, req ctrl.Request, ebd ebdmanv1alpha1.ebd) (bool, error) {
vfmReady, err := r.mbr.IsReady(ctx, req.Name, req.Namespace)
condition := metav1.Condition{
Type:               ebdmanv1alpha1.KubernetesvfmHealthy,
Observebdneration: ebd.Generation,
LastTransitionTime: metav1.Now(),
}
if err != nil {
// There was an error checking readiness - Set status to false
condition.Status = metav1.ConditionFalse
condition.Reason = ebdmanv1alpha1.ReasonError
condition.Message = fmt.Sprintf("Failed to check  vfm readiness: %v", err)
} else if vfmReady {
// The vfm is ready - Set status to true
condition.Status = metav1.ConditionTrue
condition.Reason = ebdmanv1alpha1.ReasonReady
condition.Message = "vfm custom resource is ready"
} else {
// The vfm is not ready - Set status to false
condition.Status = metav1.ConditionFalse
condition.Reason = ebdmanv1alpha1.ResourceProgressing
condition.Message = "vfm custom resource is not ready"
}
// Get the latest version of the ebd
latestebd := ebdmanv1alpha1.ebd{}
if err := r.Get(ctx, req.NamespacedName, &latestebd); err != nil {
return vfmReady, err
}
oldConditions := latestebd.Status.Conditions
meta.SetStatusCondition(&latestebd.Status.Conditions, condition)
if !conditionChanged(&oldConditions, &latestebd.Status.Conditions, ebdmanv1alpha1.KubernetesvfmHealthy) {
return vfmReady, nil
}
if err := r.Status().Update(ctx, &latestebd); err != nil {
r.Log.Error(err, "failed to update vfm status")
return vfmReady, err
}
return vfmReady, nil
}
func conditionChanged(oldConditions, newConditions *[]metav1.Condition, conditionType string) bool {
newCondition := meta.FindStatusCondition(*newConditions, conditionType)
oldCondition := meta.FindStatusCondition(*oldConditions, conditionType)
if oldCondition == nil && newCondition == nil {
return false
}
if oldCondition == nil || newCondition == nil {
return true
}
return oldCondition.Status != newCondition.Status || oldCondition.Reason != newCondition.Reason || oldCondition.Message != newCondition.Message
}

答案1

得分: 3

对于你的问题:

  1. 这是我第一次使用状态和条件,我是正确的方式吗?还是我漏掉了什么?

    你管理 Kubernetes 资源状态和条件的方法基本上是正确的。Kubernetes API 对象中的 status 子资源通常用于表示系统的当前状态,并且可以包含条件。

    条件是一组字段,以比truefalse更详细的方式描述对象的状态。每个条件通常具有typestatusreasonmessagelastTransitionTime。你的代码根据 vfm 自定义资源是否就绪正确地设置了这些字段。

    需要注意的是,条件应该是分级的 - 这意味着无论其先前的值如何,它们应该被设置为当前观察到的值。对于组件当前状态的所有重要或用户可理解的方面,它们也应该被设置为truefalseunknown。这使得条件成为指示“瞬态状态”的良好机制,例如ProgressingDegraded,这些状态可能随时间或基于外部状态而变化。

  2. 有时当我执行更新 r.Status().Update 时,会出现错误:Operation cannot be fulfilled on eds.core.vtw.bmw.com "resource01": the object has been modified; please apply your changes to the latest version and try again

    这个错误是因为在你的控制器处理该对象时,另一个客户端更新了相同的对象。这可能是另一个控制器,甚至是同一个控制器的另一个实例(如果你运行多个实例)。

    处理这个问题的一种可能方法是使用重试机制,在出现此错误时重新尝试状态更新。在你的情况下,你已经实现了一个 conditionChanged 检查,只有在条件发生变化时才尝试状态更新。这是一个避免不必要的更新的好方法,但它并不能完全防止错误,因为在你的 Get 调用和 Status().Update 调用之间,另一个客户端仍然可以更新对象。

    你还可以考虑使用 Patch 而不是 Update 来修改状态,这减少了与其他更新冲突的风险。补丁允许对对象进行部分更新,因此你更不太可能遇到冲突。

    关于时间问题,你可以考虑仅在状态实际发生变化时更新 LastTransitionTime,而不是每次进行健康检查时都更新它。这意味着 LastTransitionTime 反映的是状态上次更改的时间,而不是检查执行的时间。

    需要记住的一点是,频繁更新状态子资源,即使状态没有变化,也可能导致不必要的 API 服务器负载。你应该努力只在状态发生变化时更新状态。

考虑到这些要点,checkHealth 函数的可能更新版本如下:

func (r *ebdReconciler) checkHealth(ctx context.Context, req ctrl.Request, ebd ebdmanv1alpha1.ebd) (bool, error) {
    vfmReady, err := r.mbr.IsReady(ctx, req.Name, req.Namespace)
    condition := metav1.Condition{
        Type:   ebdmanv1alpha1.KubernetesvfmHealthy,
        Status: metav1.ConditionUnknown, // 从未知状态开始
    }

    latestebd := ebdmanv1alpha1.ebd{}
    if err := r.Get(ctx, req.NamespacedName, &latestebd); err != nil {
        return vfmReady, err
    }
    oldCondition := meta.FindStatusCondition(latestebd.Status.Conditions, ebdmanv1alpha1.KubernetesvfmHealthy)

    if err != nil {
        // 检查就绪状态时出错 - 将状态设置为 false
        condition.Status = metav1.ConditionFalse
        condition.Reason = ebdmanv1alpha1.ReasonError
        condition.Message = fmt.Sprintf("Failed to check vfm readiness: %v", err)
    } else if vfmReady {
        // vfm 就绪 - 将状态设置为 true
        condition.Status = metav1.ConditionTrue
        condition.Reason = ebdmanv1alpha1.ReasonReady
        condition.Message = "vfm custom resource is ready"
    } else {
        // vfm 未就绪 - 将状态设置为 false
        condition.Status = metav1.ConditionFalse
        condition.Reason = ebdmanv1alpha1.ResourceProgressing
        condition.Message = "vfm custom resource is not ready"
    }

    // 仅在状态发生变化时更新 LastTransitionTime
    if oldCondition == nil || oldCondition.Status != condition.Status {
        condition.LastTransitionTime = metav1.Now()
    } else {
        condition.LastTransitionTime = oldCondition.LastTransitionTime
    }

    meta.SetStatusCondition(&latestebd.Status.Conditions, condition)

    if oldCondition != nil && condition.Status == oldCondition.Status && condition.Reason == oldCondition.Reason && condition.Message == oldCondition.Message {
        return vfmReady, nil
    }

    // 冲突时重试
    retryErr := retry.RetryOnConflict(retry.DefaultRetry, func() error {
        // 在尝试更新之前检索 ebd 的最新版本
        // RetryOnConflict 使用指数退避来避免耗尽 apiserver
        if getErr := r.Get(ctx, req.NamespacedName, &latestebd); getErr != nil {
            return getErr
        }
        if updateErr := r.Status().Update(ctx, &latestebd); updateErr != nil {
            return updateErr
        }
        return nil
    })

    if retryErr != nil {
        r.Log.Error(retryErr, "Failed to update vfm status after retries")
        return vfmReady, retryErr
    }

    return vfmReady, nil
}

在这个更新的版本中:

  • 仅当条件的状态发生变化时,才会更新 LastTransitionTime 字段。这将确保 LastTransitionTime 准确地反映状态上次更改的时间,而不是 checkHealth 函数上次运行的时间。这应该提供一个更准确的时间线,显示资源的状态实际上是何时更改的,而不是协调循环运行的时间。

  • 使用 retry.RetryOnConflict 添加了一个重试机制,当发生冲突错误时重新尝试状态更新。请注意,你需要导入 "k8s.io/client-go/util/retry。这是处理 Operation cannot be fulfilled... 错误的常见模式。

这些更改应该有助于解决你在更新 Kubernetes 资源的状态和条件时遇到的问题。请记住,你仍然可能偶尔遇到冲突错误,特别是如果有其他客户端更新相同的对象。在这些情况下,RetryOnConflict 函数将使用对象的最新版本重试更新。

英文:

To your questions:

  1. This is the first time that I use status and conditions, is it right way or do I miss something?

    Your approach to managing the status and conditions of Kubernetes resources is generally fine. The status subresource in a Kubernetes API object is typically used to represent the current state of the system, and it can include conditions.

    A condition is a collection of fields that describe the state of an object in a more detailed way than just true or false. Each condition typically has a type, status, reason, message, and lastTransitionTime. Your code correctly sets these fields based on whether the vfm custom resource is ready or not.

    It is good to note that conditions should be leveled - meaning they should be set to their current observed value regardless of their previous value. They should also be set (either true, false, or unknown) for all the significant or user-meaningful aspects of the component's current state. This makes conditions a good mechanism to indicate "transient states" like Progressing or Degraded that might be expected to change over time or based on external state.

  2. Sometimes when I do the update r.Status().Update I got error: Operation cannot be fulfilled on eds.core.vtw.bmw.com “resource01”: the object has been modified; please apply your changes to the latest version and try again.

    This error occurs because another client updated the same object while your controller was processing it. This could be another controller or even another instance of the same controller (if you run more than one).

    One possible way to handle this is to use a retry mechanism that re-attempts the status update when this error occurs. In your case, you have implemented a conditionChanged check to only attempt the status update if the condition has changed. This is a good approach to avoid unnecessary updates, but it does not completely prevent the error, because another client could still update the object between your Get call and Status().Update call.

    You could also consider using Patch instead of Update to modify the status, which reduces the risk of conflicting with other updates. Patching allows for partial updates to an object, so you are less likely to encounter conflicts.

    Regarding the timing issue, you could consider updating the LastTransitionTime only when the status actually changes, instead of every time the health check is done. This would mean the LastTransitionTime reflects when the status last changed, rather than the last time the check was performed.

    One thing to keep in mind is that frequent updates to the status subresource, even if the status does not change, can cause unnecessary API server load. You should strive to update the status only when it changes.

A possible updated version of your checkHealth function considering those points could be:

func (r *ebdReconciler) checkHealth(ctx context.Context, req ctrl.Request, ebd ebdmanv1alpha1.ebd) (bool, error) {
    vfmReady, err := r.mbr.IsReady(ctx, req.Name, req.Namespace)
    condition := metav1.Condition{
        Type:   ebdmanv1alpha1.KubernetesvfmHealthy,
        Status: metav1.ConditionUnknown, // start with unknown status
    }

    latestebd := ebdmanv1alpha1.ebd{}
    if err := r.Get(ctx, req.NamespacedName, &latestebd); err != nil {
        return vfmReady, err
    }
    oldCondition := meta.FindStatusCondition(latestebd.Status.Conditions, ebdmanv1alpha1.KubernetesvfmHealthy)

    if err != nil {
        // There was an error checking readiness - Set status to false
        condition.Status = metav1.ConditionFalse
        condition.Reason = ebdmanv1alpha1.ReasonError
        condition.Message = fmt.Sprintf("Failed to check  vfm readiness: %v", err)
    } else if vfmReady {
        // The vfm is ready - Set status to true
        condition.Status = metav1.ConditionTrue
        condition.Reason = ebdmanv1alpha1.ReasonReady
        condition.Message = "vfm custom resource is ready"
    } else {
        // The vfm is not ready - Set status to false
        condition.Status = metav1.ConditionFalse
        condition.Reason = ebdmanv1alpha1.ResourceProgressing
        condition.Message = "vfm custom resource is not ready"
    }

    // Only update the LastTransitionTime if the status has changed
    if oldCondition == nil || oldCondition.Status != condition.Status {
        condition.LastTransitionTime = metav1.Now()
    } else {
        condition.LastTransitionTime = oldCondition.LastTransitionTime
    }

    meta.SetStatusCondition(&latestebd.Status.Conditions, condition)

    if oldCondition != nil && condition.Status == oldCondition.Status && condition.Reason == oldCondition.Reason && condition.Message == oldCondition.Message {
        return vfmReady, nil
    }

    // Retry on conflict
    retryErr := retry.RetryOnConflict(retry.DefaultRetry, func() error {
        // Retrieve the latest version of ebd before attempting update
        // RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
        if getErr := r.Get(ctx, req.NamespacedName, &latestebd); getErr != nil {
            return getErr
        }
        if updateErr := r.Status().Update(ctx, &latestebd); updateErr != nil {
            return updateErr
        }
        return nil
    })

    if retryErr != nil {
        r.Log.Error(retryErr, "Failed to update vfm status after retries")
        return vfmReady, retryErr
    }

    return vfmReady, nil
}

In this updated version:

  • The LastTransitionTime field is updated only when the condition's status changes. This will ensure that the LastTransitionTime accurately reflects when the status was last changed rather than when the checkHealth function was last run. This should provide a more accurate timeline of when the resource's status actually changed, rather than when the reconciliation loop was run.

  • A retry mechanism is added using retry.RetryOnConflict to re-attempt the status update when a conflict error occurs. Note that you'll need to import the "k8s.io/client-go/util/retry" package for this.
    This is a common pattern for dealing with the Operation cannot be fulfilled... error.

These changes should help to address the issues you were facing with updating the status and conditions of your Kubernetes resources.
Remember that you may still occasionally get conflict errors, especially if there are other clients updating the same object. In these cases, the RetryOnConflict function will retry the update with the latest version of the object.

huangapple
  • 本文由 发表于 2023年6月2日 15:25:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76388004.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定