2023年2月10日 14:17:02go评论66阅读模式

英文:

Prometheus - Percentage of gauge values below a certain threshold

问题

我正在使用黑匣子导出器从不同的端点收集指标，并希望设置一个 SLI 来确定每个服务中慢于 300ms 和 1s 的 GET 请求数量。导出器提供了一个名为 probe_duration_seconds 的 gauge 指标。我正在尝试运行一个 PromQL 查询，以计算在过去的 5 小时内低于 300ms 的 probe_duration_seconds 占比。

我的当前查询 probe_duration_seconds{}[5h] < 0.3 返回错误：

执行查询时出错：无效的参数 "query"：1:1：解析错误：二进制表达式必须只包含标量和即时向量类型。

我还尝试过：

100 - sum(rate(probe_success{}[5h]) * 100) by (instance)

这会给我总体的成功/失败率，但我也想根据响应时间来量化它。

英文:

I'm using the blackbox exporter to gather metrics from various endpoints, and I want to set a SLI to determine the number of GET requests that are slower than 300ms and 1s per service.
The exporter provides a gauge metric called probe_duration_seconds.
I'm trying to run a PromQL query to calculate the percentage of probe_duration_seconds that are below 300ms in the last 5 hours.

My current query probe_duration_seconds{}[5h] < 0.3
returns an error:

> Error executing query: invalid parameter "query": 1:1: parse error:
> binary expression must contain only scalar and instant vector types.

I have also tried:
100 - sum(rate(probe_success{}[5h]) * 100) by (instance)
which gives me the overall success/failure rate, but I want to quantify it based on response time as well.

答案1

得分: 1

Prometheus没有提供一个函数，可以返回给定回溯窗口上小于给定阈值的原始样本的百分比。这个功能可以通过subquery feature来模拟。例如，以下查询返回在过去一小时内值小于0.3的probe_duration_seconds样本的百分比：

count_over_time((probe_duration_seconds < 0.3)[5h:1m])
  /
count_over_time((probe_duration_seconds)[5h:1m])

这个查询期望Prometheus每分钟收集一次原始样本 - 在方括号中的冒号后面看到1m。请将其设置为您的实际采集间隔，以获得更准确的结果。

P.S. VictoriaMetrics - 我正在开发的一种类似Prometheus的替代解决方案 - 提供了share_le_over_time()函数，可以替代上面的查询：

share_le_over_time(probe_duration_seconds[5h], 0.3)

这种方法相对于基于子查询的方法有以下优点：

更容易编写和维护。
可以适用于原始样本之间的任何scrape_interval，无需为不同的scrape_interval调整查询。
执行速度更快，执行过程中内存消耗更少，因为初始方法中的子查询可能会为较小的scrape间隔和较大的回溯窗口分配大量内存。

英文:

Prometheus doesn't provide a function, which could return the percentage of raw samples with values smaller than the given threshold on the given lookbehind window. This functionality can be emulated via subquery feature. For example, the following query returns the percentage of probe_duration_seconds samples with the values smaller than 0.3 during the last hour:

count_over_time((probe_duration_seconds &lt; 0.3)[5h:1m])
  /
count_over_time((probe_duration_seconds)[5h:1m])

This query expects that the raw samples are collected by Prometheus every minute - see 1m after the colon in square brackets. Set it to your real scrape interval for more accurate results.

P.S. VictoriaMetrics - an alternative Prometheus-like solution I work on - provides share_le_over_time() function, which can be used instead of the query above:

share_le_over_time(probe_duration_seconds[5h], 0.3)

This approach has the following advantages over the subquery-based approach:

It is easier to write and maintain.
It works with any scrape_interval between raw samples - there is no need in adjusting the query for different scrape intervals.
It works faster than the initial approach and consumes less memory during the execution, since the subquery in the initial approach may allocate big amounts of memory for small scrape intervals and big lookbehind windows.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Prometheus – 某一阈值以下的仪表值百分比

问题

答案1

When adding Prometheus instrumentation with Java or .NET, is the web server for metrics running in a separate thread?

Prometheus Server on EKS with Helm – FailedScheduling

Enabling Prometheus metrics on WSO2 Micro-Integrator

应用Prometheus指标于特定部署

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论