ArgoCD 针对长时间运行的同步操作的通知

huangapple go评论76阅读模式
英文:

Argocd notifications for long running sync operations

问题

我正在运行Argocd 2.7.4,并且希望在同步操作花费超过几分钟的情况下收到警报。我尝试使用文档中的示例(https://argo-cd.readthedocs.io/en/stable/operator-manual/notifications/triggers/#functions)

  when: time.Now().Sub(time.Parse(app.status.operationState.startedAt)).Minutes() >= 5 

然而,这只会在操作完成后触发,对我的用例来说有点无用。实际上是否可能直接使用Argocd创建这样的警报,还是我需要自己构建它?

英文:

I'm running Argocd 2.7.4 and I want to get an alert for sync operations that take longer than a few minutes. I tried using the example from the documentation (https://argo-cd.readthedocs.io/en/stable/operator-manual/notifications/triggers/#functions)

  when: time.Now().Sub(time.Parse(app.status.operationState.startedAt)).Minutes() >= 5

However, that only triggers after the operation has finished which is kinda useless for my usecase. Is it actually possible to create an alert like this with argocd directly or do I need to build it myself?

答案1

得分: 1

我怀疑这不是原生支持的,这意味着您需要使用Argo CD的API和像Prometheus这样的监控工具的组合。想法是将Argo CD的指标暴露给Prometheus,然后基于同步操作的持续时间创建警报。

以下是您可以遵循的步骤的高级概述:

首先,通过配置argocd-metrics服务,将Argo CD的Application Controller Metrics暴露给Prometheus。

然后,设置Prometheus以从Argo CD中获取指标,使用Prometheus配置

例如,参考Ritesh Nanda(2022年10月)的“使用Prometheus运算符和Datadog监视Argo CD指标”示例。

接下来,在Prometheus中基于同步操作的持续时间创建自定义警报。

这将涉及查询应用程序的状态(ApplicationService):对/api/v1/applications/{applicationName}端点的GET请求将包含有关应用程序同步状态的信息,位于status.sync.status字段中。

要将同步操作持续时间暴露给Prometheus,您需要创建自定义导出器。该导出器将运行脚本以获取同步操作持续时间,并将其公开为Prometheus指标。

import requests
import time
from dateutil.parser import parse
from prometheus_client import start_http_server, Gauge

# Argo CD API服务器的URL
argo_cd_api_server = "http://<argocd-server>:<port>"

# 创建一个Gauge指标以跟踪同步操作的持续时间
SYNC_DURATION = Gauge('argocd_sync_duration_seconds', '同步操作的持续时间', ['application'])

def fetch_sync_durations():
    # 获取所有应用程序的列表
    response = requests.get(f"{argo_cd_api_server}/api/v1/applications")
    applications = response.json()["items"]

    for application in applications:
        # 获取应用程序的状态
        response = requests.get(f"{argo_cd_api_server}/api/v1/applications/{application['metadata']['name']}")
        status = response.json()["status"]

        # 如果应用程序当前正在同步,计算同步操作的持续时间
        if status["sync"]["status"] == "Syncing":
            started_at = parse(status["operationState"]["startedAt"])
            current_duration = time.time() - started_at.timestamp()
            SYNC_DURATION.labels(application=application['metadata']['name']).set(current_duration)

if __name__ == '__main__':
    # 启动服务器以公开指标
    start_http_server(8000)
    # 每60秒获取一次同步操作的持续时间
    while True:
        fetch_sync_durations()
        time.sleep(60)

这将定义一个名为argocd_sync_duration_seconds的Gauge指标,用于跟踪同步操作的持续时间。fetch_sync_durations函数获取同步操作的持续时间,并为每个应用程序设置Gauge指标的值。然后,脚本启动HTTP服务器以公开指标,并每60秒运行一次fetch_sync_durations函数。

最后,配置一个警报管理器以基于警报发送通知。您可以按照Prometheus Alertmanager文档设置警报管理器并配置通知渠道(例如Slack、电子邮件等)。

有点复杂,但这将使Argo CD中的长时间同步操作能够触发自定义警报。

英文:

I suspect this is not natively supported, which means you would need to use a combination of Argo CD's API and a monitoring tool like Prometheus. The idea would be to expose Argo CD's metrics to Prometheus and then create an alert based on the duration of the sync operation.

Here is a high-level outline of the steps you can follow:

First, Expose Argo CD's Application Controller Metrics to Prometheus by configuring the argocd-metrics service.

And set up Prometheus to scrape the metrics from Argo CD, using the Prometheus configuration.

See for example "Argo CD metrics with Prometheus operator and Datadog" by Ritesh Nanda (Oct. 2022)

Then create a custom alert in Prometheus based on the duration of the sync operation.

That would involve querying the applications' status (ApplicationService): a GET request to the /api/v1/applications/{applicationName} endpoint will include information about the application's sync status in the status.sync.status field.

To expose the sync operation durations to Prometheus, you would need to create a custom exporter. This exporter would run the script to fetch the sync operation durations and expose them as a Prometheus metric.

import requests
import time
from dateutil.parser import parse
from prometheus_client import start_http_server, Gauge

# URL of the Argo CD API server
argo_cd_api_server = &quot;http://&lt;argocd-server&gt;:&lt;port&gt;&quot;

# Create a Gauge metric to track sync operation durations
SYNC_DURATION = Gauge(&#39;argocd_sync_duration_seconds&#39;, &#39;Duration of sync operations&#39;, [&#39;application&#39;])

def fetch_sync_durations():
    # Get a list of all applications
    response = requests.get(f&quot;{argo_cd_api_server}/api/v1/applications&quot;)
    applications = response.json()[&quot;items&quot;]

    for application in applications:
        # Get the status of the application
        response = requests.get(f&quot;{argo_cd_api_server}/api/v1/applications/{application[&#39;metadata&#39;][&#39;name&#39;]}&quot;)
        status = response.json()[&quot;status&quot;]

        # If the application is currently syncing, calculate the duration of the sync operation
        if status[&quot;sync&quot;][&quot;status&quot;] == &quot;Syncing&quot;:
            started_at = parse(status[&quot;operationState&quot;][&quot;startedAt&quot;])
            current_duration = time.time() - started_at.timestamp()
            SYNC_DURATION.labels(application=application[&#39;metadata&#39;][&#39;name&#39;]).set(current_duration)

if __name__ == &#39;__main__&#39;:
    # Start up the server to expose the metrics
    start_http_server(8000)
    # Fetch sync operation durations every 60 seconds
    while True:
        fetch_sync_durations()
        time.sleep(60)

That would define a Gauge metric named argocd_sync_duration_seconds, created to track the duration of sync operations.
The fetch_sync_durations function fetches the sync operation durations and sets the value of the Gauge metric for each application.
The script then starts a HTTP server to expose the metrics and runs the fetch_sync_durations function every 60 seconds.

Finally, configure an alert manager to send notifications based on the alert. You can follow the Prometheus Alertmanager documentation to set up the alert manager and configure the notification channels (e.g., Slack, email, etc.).

A bit convoluted, but that would enable a custom alert for long-running sync operations in Argo CD.

huangapple
  • 本文由 发表于 2023年6月12日 22:22:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76457598.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定