问题

我在我的应用程序中定义了许多计数器（使用Java Micrometer），为了触发警报，我会使用"error":"alert"对我想要监视的计数器进行标记，因此像{error="alert"}这样的查询将生成多个范围向量：

error_counter_component1{error="alert", label2="random"}
error_counter_component2{error="alert", label2="random2"}
error_counter_component3{error="none", label2="random3"}

我无法控制计数器的名称，我只能将标签添加到我想在警报中使用的计数器中。我想要的警报是，如果所有带有error="alert"标签的计数器在一小时内增加超过3次，那么我可以使用这种查询：increase({error="alert"}[1h]) > 3，但是我在Prometheus中遇到了以下错误：Error executing query: vector cannot contain metrics with the same labelset

是否有一种方法可以合并两个范围向量，或者是否应该在计数器的名称中包含某种标签？或者我应该为错误设置一个单独的计数器，标签应该指定源，类似于这样：

errors_counter{source="component1", use_in_alert="yes"}
errors_counter{source="component2", use_in_alerts="yes"}
errors_counter{source="component3", use_in_alerts="no"}

英文:

I'm defining a lot of counters in my app (using java micrometer) and in order to trigger alerts I tag the counters which I want to monitor with "error":"alert" so a query like {error="alert"} will generate multiple range vectors:

   error_counter_component1{error=&quot;alert&quot;, label2=&quot;random&quot;}
   error_counter_component2{error=&quot;alert&quot;, label2=&quot;random2&quot;}
   error_counter_component3{error=&quot;none&quot;, label2=&quot;random3&quot;}

I don't control the name of the counters I can only add the label to the counters I want to use in my alert. The alert that I want to have is if all the counters labeled with error="alert" increase more then 3 in one hour so I could use this kind of query: increase({error="alert"}[1h]) > 3 but I get the fallowing error in Prometheus: Error executing query: vector cannot contain metrics with the same labelset

Is there a way to merge two range vectors or should I include some kind of tag in the name of the counter? Or should I have a single counter for errors and the tags should specify the source something like this:

errors_counter{source=&quot;component1&quot;, use_in_alert=&quot;yes&quot;}
errors_counter{source=&quot;component2&quot;, use_in_alerts=&quot;yes&quot;}
errors_counter{source=&quot;component3&quot;, use_in_alerts=&quot;no&quot;}

答案1

得分: 1

带有source="componentX"标签的版本更符合Prometheus数据模型。这是基于假设error_counter指标确实是一个指标，并且除了source标签值外，它将具有相同的标签等（例如，它是由相同的库或框架发出的）。

添加诸如use_in_alerts标签之类的内容并不是一个很好的解决方案。这样的标签不能识别时间序列。
我建议在构建警报查询的地方放置要警报的组件列表，并动态创建单独的警报规则（而无需将此类标签添加到原始数据中）。
另一种解决方案是使用一个单独的伪指标，仅用于提供有关组件的元数据，例如：

component_alert_on{source="component2"} 1

并将其与警报规则组合，仅对您需要的组件发出警报。它可以以任何可能的方式生成，但一个可能性是将其添加到静态记录规则中。这样做的缺点是在某种程度上会使警报查询变得复杂。
当然，use_in_alerts标签在只对此指标发出警报时也可能有效（至少在您只对此指标发出警报时有效）。

英文:

The version with source="componentX" label is much more fitting to prometheus data model. This is assuming the error_counter metric is really one metric and other than source label value it will have same labels etc. (for example it is emitted by the same library or framework).

Adding stuff like use_in_alerts label is not a great solution. Such label does not identify time series.
I'd say put a list of components to alert on somewhere where your alerting queries are constructed and dynamically create separate alerting rules (without adding such label to raw data).
Other solution is to have a separate pseudo metric that will obnly be used to provide metadata about components, like:

   component_alert_on{source=&quot;component2&quot;} 1

and. combine it in alerting rule to only alert on components you need. It can be generated in any possible way, but one possibility is to have it added in static recording rule. This has the con of complicating alerting query somehow.
But of course use_in_alerts label will also probably work (at least while you are only alerting on this metric).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Prometheus按标签进行查询，使用范围向量

问题

答案1

当在Camel分割过程中出现异常时，如何获取groupedExchanges？

[警告] ConnectivityManager 中的 TYPE_MOBILE 已被弃用。

将控制台界面转换为Java图形用户界面(GUI)。

阅读两个日期之间的日历事件

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论