英文:
Negative error budget even when Service Level Indicator(SLI) is greater than Service Level Objective(SLO)
问题
我创建了一个基于请求的服务水平指标(SLI),其中好的服务过滤器是特定虚拟机实例的CPU使用时间(这里是“instance-1”),而总服务过滤器是所有虚拟机实例的CPU使用情况。我将SLO目标设置为50%。
我认为由于SLI大于SLO,错误预算应该是正的,但我得到了负的错误预算。这是什么意思?
我在这里使用的服务是自定义服务。图表显示了“instance-1”实例的CPU利用率的好/总比率。
英文:
I created a request based SLI where good service filter is CPU usage time by a specific VM instance ("instance-1" here) and total service filter is CPU usage by all VM instances. I set the SLO goal to 50%.
I thought that since SLI is greater than SLO the error budget should be positive but I am got negative error budget.What does this mean?
The service I used here is a custom service. The graph is about good/total ratio of CPU utilization for an instance "instance-1".
答案1
得分: 1
服务的错误预算是服务在一定时间内允许经历的故障(错误或其他不良事件)数量。错误预算是基于服务级别目标(SLO)计算的。例如,如果您的SLO是99.9%的可用性,那么您的错误预算就是剩余的0.1%。这是可接受的错误幅度,或者是在特定时间段内可以容忍的停机时间或错误数量。您可以在这个文档中找到相关信息。
负错误预算为-84.97%,表示该服务已经消耗了超过分配的错误预算。这可能发生在服务可靠性显著下降到约定的服务级别目标(SLO)以下时,请参考这个文档以获取相关信息。但是,服务级别指标(SLI)大于服务级别目标(SLO),因此您需要进一步调查。您可以创建警报以监视SLO是否已被违反。如果您没有收到任何警报,您可以在公共问题跟踪器上提出支持请求,并附上您的问题描述。此问题跟踪器是供终端用户报告错误并请求改进Google Cloud产品的论坛。
您还可以查看这个文档以获取更多详细信息。
英文:
A service's error budget is the number of failures (errors or other bad events) that the service is allowed to experience over a given period of time.The error budget is calculated based on the Service Level Objective (SLO). For instance, if your SLO is 99.9% uptime, then your error budget is the remaining 0.1%. This is the allowable margin of error, or the amount of downtime or errors that are tolerable within a specific period.You can find the information on this document.
The negative error budget of -84.97% indicates that the service has consumed more than the allotted error budget . This might occur if the service reliability drops significantly below the agreed Service Level Objective (SLO) , refer to this document for relevant info. But Service Level Indicator(SLI) is greater than Service Level Objective(SLO) and hence you need to investigate more on this. You can create an alert to monitor when the SLO has been breached. If you don’t get any alerts you can raise a support ticket at Public Issue Tracker report with the description of your issue . This Issue Tracker is a forum for end users to report bugs and request features to improve Google Cloud products.
You can also look at the document for more details
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论