在Grafana Alerting中,警报在几分钟内重复出现。

huangapple go评论106阅读模式
英文:

Alerts repeating within minutes using Grafana Alerting

问题

使用Grafana 9.2.2与VictoriaMetrics作为数据源,在满足特定条件时发送警报。使用外部服务通过配置API作为Webhook联系点来传递警报,通过该联系点发送和进一步处理负载以在Slack上传递。

警报评估行为设置为-每1小时评估0秒。希望在满足条件后立即触发警报,并每1小时评估一次,因为这是新数据点的频率。

预期行为:在满足条件后每24小时触发一次警报。

实际行为:一旦满足条件,警报会被触发(应该如此)。然而,相同的警报在5分钟内再次发送。

如何处理这个问题?

尝试的选项

  1. 通知策略时间 - 在使用alertnamegrafana-folder进行分组时,尝试调整分组间隔、重复间隔和等待时间,但没有帮助。我还尝试使用alert_uid进行分组,但没有被解释。我是否尝试了错误的时间组合(与警报评估行为周期结合在一起)。
  2. 在接收到负载后,是否需要向Grafana发送确认?如果是,请分享如何发送确认或链接任何你能找到的文档。我已经将问题隔离到Grafana,调用触发API两次。
  3. 有静音和/或消音警报的选项。这是在这里要遵循的方法吗?如果是,请在触发一次后将警报静音24小时(因为不希望再次重复24小时)。

谢谢。非常感谢。

英文:

Using Grafana 9.2.2 with VictoriaMetrics as data source to send alerts when certain criteria is met.
Using an external service to deliver alerts by configuring an API as webhook contact point, over which the payload is sent and processed further to be delivered on Slack.

Alert evaluation behaviour is set as - Evaluate every 1h for 0s. Want the alert to be fired as soon as condition is met, and evaluate every 1h because that is the frequency of new data points.

Expected behaviour: Alert once every 24hours after the condition is met.

Actual behaviour: once the condition is met, alert gets triggered ( as it should). However, the same alert gets sent again within 5 minutes.

How to handle this?

Options tried:

  1. Notification policy timings - played around with grouping interval, repeat interval and wait time, while grouping using alertname and grafana-folder, didn't help. Also, I tried to group using alert_uid but that did not get interpreted. Am I trying to wrong combination of timings ( clubbed with alert evaluation behaviour period).
  2. Do I need to send an acknowledgment back to Grafana after receiving the payload? If so, please share how or link any document you can find. I haven't been able to find anything that answers yes/no and how. I have isolated the issue to Grafana, the API to trigger is getting called twice.
  3. There are options to Mute and/or Silence an alert. Is that the approach to be followed here? If yes, should the alert be muted for 24hrs(since do not want to repeat for another 24hrs) after being fired once?

Thanks. Much appreciated.

答案1

得分: 0

问题是独立运行的多个Grafana实例。我们有两个Grafana的pod在运行,并且它们都在处理请求,因此导致了重复。需要查看如何在集群模式下运行Grafana以供将来使用。

英文:

The issue was multiple instances of Grafana running independently of each other. We had 2 pods of Grafana running, and they were both serving the request, hence the duplication.
Need to check how to run Grafana in cluster mode for future.

huangapple
  • 本文由 发表于 2023年1月23日 12:19:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/75205642.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定