无法使用AWS分发的Open Telemetry从我的Spring Boot应用程序中摄取指标。

huangapple go评论81阅读模式
英文:

Not able to ingest metrics from my spring boot app using AWS Distro for Open Telemetry

问题

我有一个部署在AWS ECS Fargate集群上的Spring Boot应用程序。作为一个sidecar容器,我部署了“adot-collector”容器,用于从Amazon Elastic Container Service(Amazon ECS)中抓取指标,并使用AWS Distro for Open Telemetry(ADOT)将它们导入Amazon Managed Service for Prometheus。

我的Spring应用程序在端口8080上公开了一个API "/actuator/prometheus",用于公开Java应用程序的指标,我希望ADOT抓取此API以获取指标。以下是我已经配置的ADOT收集器配置。

adot.config.yaml

receivers:
  prometheus:
    config:
      global:
        scrape_interval: 15s
        scrape_timeout: 10s
      scrape_configs:
      - job_name: "prometheus"
        static_configs:
        - targets: [ 0.0.0.0:9090 ]
      - job_name: "my-spring-app"
        metrics_path: "actuator/prometheus"
        static_configs:
        - targets: [ 0.0.0.0:8080 ]

  awsecscontainermetrics:
    collection_interval: 10s
processors:
  filter:
    metrics:
      include:
        match_type: strict
        metric_names:
          - ecs.task.memory.utilized
          - ecs.task.memory.reserved
          - ecs.task.cpu.utilized
          - ecs.task.cpu.reserved
          - ecs.task.network.rate.rx
          - ecs.task.network.rate.tx
          - ecs.task.storage.read_bytes
          - ecs.task.storage.write_bytes
exporters:
  prometheusremotewrite:
    endpoint: https://xxx/remote_write
    auth:
      authenticator: sigv4auth
  logging:
    loglevel: info
extensions:
  health_check:
  pprof:
    endpoint: :1888
  zpages:
    endpoint: :55679
  sigv4auth:
    region: us-west-2
    service: aps
    assume_role:
      arn:
      sts_region: eu-west-2
service:
  extensions: [pprof, zpages, health_check, sigv4auth]
  telemetry:
    logs:
      level: debug
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [logging, prometheusremotewrite]
    metrics/ecs:
      receivers: [awsecscontainermetrics]
      processors: [filter]
      exporters: [logging, prometheusremotewrite]

ECS任务定义如下

任务定义:

{
  "family": "adot-prom",
  "networkMode": "awsvpc",
  "containerDefinitions": [
    {
      "name": "adot-collector",
      "image": "account_id.dkr.ecr.region.amazonaws.com/image-tag",
      "essential": true,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/ecs-adot-collector",
          "awslogs-region": "my-region",
          "awslogs-stream-prefix": "ecs",
          "awslogs-create-group": "True"
        }
      }
    },
    {
      "name": "prometheus",
      "image": "prom/prometheus:main",
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/ecs-prom",
          "awslogs-region": "my-region",
          "awslogs-stream-prefix": "ecs",
          "awslogs-create-group": "True"
        }
      }
    },
    {
      "name": "my-spring-app",
      "image": "ecr repo url",
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/ecs-app",
          "awslogs-region": "my-region",
          "awslogs-stream-prefix": "app",
          "awslogs-create-group": "True"
        },
        "portMappings": [{
          "containerPort": 8080
        }]
      }
    }
  ],
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "cpu": "1024"
}

但不知道为什么每次Prometheus尝试从我的/actuator/prometheus端点抓取指标时,我都会收到以下错误,尽管端点存在。

错误:

debug scrape/scrape.go:1353 Scrape failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "prometheus", "target": "http://0.0.0.0:8080/actuator/prometheus", "error": "server returned HTTP status 404 Not Found", "errorVerbose": "server returned HTTP status 404 Not Found\ngithub.com/prometheus/prometheus/scrape.(*targetScraper).scrape\n\tgithub.com/prometheus/prometheus@v0.43.0/scrape/scrape.go:817\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport\n\tgithub.com/prometheus/prometheus@v0.43.0/scrape/scrape.go:1340\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).run\n\tgithub.com/prometheus/prometheus@v0.43.0/scrape/scrape.go:1264\nruntime.goexit\n\truntime/asm_amd64.s:1598"}
英文:

I have a spring boot application which is deployed on AWS ECS Fargate cluster. As a sidecar container i have deployed "adot-collector" container to scrape metrics from Amazon Elastic Container Service (Amazon ECS) and ingest them into Amazon Managed Service for Prometheus using AWS Distro for Open Telemetry (ADOT).

I have an API "/actuator/prometheus" exposed on port 8080 on my spring app which exposes my java pplication metrics and i want ADOT to scrape this API for metrics. Below is the adot collector config i have in place.

adot.config.yaml

receivers:
  prometheus:
    config:
      global:
        scrape_interval: 15s
        scrape_timeout: 10s
      scrape_configs:
      - job_name: "prometheus"
        static_configs:
        - targets: [ 0.0.0.0:9090 ]
      - job_name: "my-spring-app"
        metrics_path: "actuator/prometheus"
        static_configs:
        - targets: [ 0.0.0.0:8080 ]

  awsecscontainermetrics:
    collection_interval: 10s
processors:
  filter:
    metrics:
      include:
        match_type: strict
        metric_names:
          - ecs.task.memory.utilized
          - ecs.task.memory.reserved
          - ecs.task.cpu.utilized
          - ecs.task.cpu.reserved
          - ecs.task.network.rate.rx
          - ecs.task.network.rate.tx
          - ecs.task.storage.read_bytes
          - ecs.task.storage.write_bytes
exporters:
  prometheusremotewrite:
    endpoint: https://xxx/remote_write
    auth:
      authenticator: sigv4auth
  logging:
    loglevel: info
extensions:
  health_check:
  pprof:
    endpoint: :1888
  zpages:
    endpoint: :55679
  sigv4auth:
    region: us-west-2
    service: aps
    assume_role:
      arn:
      sts_region: eu-west-2
service:
  extensions: [pprof, zpages, health_check, sigv4auth]
  telemetry:
    logs:
      level: debug
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [logging, prometheusremotewrite]
    metrics/ecs:
      receivers: [awsecscontainermetrics]
      processors: [filter]
      exporters: [logging, prometheusremotewrite]

And ECS task definition looks like this

task definition:

{
  "family": "adot-prom",
  "networkMode": "awsvpc",
  "containerDefinitions": [
    {
      "name": "adot-collector",
      "image": "account_id.dkr.ecr.region.amazonaws.com/image-tag",
      "essential": true,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/ecs-adot-collector",
          "awslogs-region": "my-region",
          "awslogs-stream-prefix": "ecs",
          "awslogs-create-group": "True"
        }
      }
    },
    {
      "name": "prometheus",
      "image": "prom/prometheus:main",
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/ecs-prom",
          "awslogs-region": "my-region",
          "awslogs-stream-prefix": "ecs",
          "awslogs-create-group": "True"
        }
      }
    },
{
      "name": "my-spring-app",
      "image": "ecr repo url",
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/ecs-app",
          "awslogs-region": "my-region",
          "awslogs-stream-prefix": "app",
          "awslogs-create-group": "True"
        }
        ,
         "portMappings": [{
         "containerPort": 8080
       }]
      }
    }
  ],
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "cpu": "1024"
}

But not sure why i always receive the following error while prometheus is trying to scrape the metrics from my /actuator/prometheus endpoint even though the endpoint exists.

Error:

debug scrape/scrape.go:1353 Scrape failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "prometheus", "target": "http://0.0.0.0:8080/actuator/prometheus", "error": "server returned HTTP status 404 Not Found", "errorVerbose": "server returned HTTP status 404 Not Found\ngithub.com/prometheus/prometheus/scrape.(*targetScraper).scrape\n\tgithub.com/prometheus/prometheus@v0.43.0/scrape/scrape.go:817\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport\n\tgithub.com/prometheus/prometheus@v0.43.0/scrape/scrape.go:1340\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).run\n\tgithub.com/prometheus/prometheus@v0.43.0/scrape/scrape.go:1264\nruntime.goexit\n\truntime/asm_amd64.s:1598"}

答案1

得分: 1

以下是翻译的内容:

我尝试复制您的设置,但由于您未提供使用的容器映像的详细信息,我不得不提供一个类似的设置,可以实现您想要的功能并按预期工作。

使用官方的 ADOT 映像和适当的 ECS 配置,以下 ECS 任务定义可用(您需要将 xxxxxxxxxxx 的值替换为您自己的值,例如帐户 ID 和 AMP 工作区 ID):

{
    "taskDefinitionArn": "arn:aws:ecs:eu-west-1:xxxxxxxxxxx:task-definition/adot:1",
    "containerDefinitions": [
        {
            "name": "aws-otel-collector",
            "image": "public.ecr.aws/aws-observability/aws-otel-collector:v0.29.1",
            "essential": true,
            "command": [
                "--config=/etc/ecs/ecs-amp-prometheus.yaml"
            ],
            "environment": [
                {
                    "name": "AWS_PROMETHEUS_SCRAPING_ENDPOINT",
                    "value": "localhost:8765"
                },
                {
                    "name": "AWS_REGION",
                    "value": "eu-west-1"
                },
                {
                    "name": "AWS_PROMETHEUS_ENDPOINT",
                    "value": "https://aps-workspaces.eu-west-1.amazonaws.com/workspaces/xxxxxxxxxxx/api/v1/remote_write"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-create-group": "true",
                    "awslogs-group": "/ecs/ecs-aws-otel-sidecar-collector",
                    "awslogs-region": "eu-west-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        },
        {
            "name": "load-gen",
            "image": "public.ecr.aws/h0h9t7p1/alpine-bash-curl-jq:latest",
            "portMappings": [
                {
                    "name": "load-gen-80-tcp",
                    "containerPort": 80,
                    "hostPort": 80,
                    "protocol": "tcp",
                    "appProtocol": "http"
                }
            ],
            "essential": true,
            "command": [
                "/bin/bash",
                "-c",
                "sleep 15; while : ; do curl -s -o /dev/null localhost:8765 ; sleep 1; done"
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-create-group": "true",
                    "awslogs-group": "/ecs/adot",
                    "awslogs-region": "eu-west-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        },
        {
            "name": "metrics-source",
            "image": "public.ecr.aws/mhausenblas/ho11y:stable",
            "cpu": 0,
            "portMappings": [
                {
                    "name": "metrics-source-8765-tcp",
                    "containerPort": 8765,
                    "hostPort": 8765,
                    "protocol": "tcp",
                    "appProtocol": "http"
                }
            ],
            "essential": true,
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-create-group": "true",
                    "awslogs-group": "/ecs/adot",
                    "awslogs-region": "eu-west-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        }
    ],
    "family": "adot",
    "taskRoleArn": "arn:aws:iam::xxxxxxxxxxx:role/ecsTaskExecutionRole",
    "executionRoleArn": "arn:aws:iam::xxxxxxxxxxx:role/ecsTaskExecutionRole",
    "networkMode": "awsvpc",
    "revision": 1,
    "requiresAttributes": [
        {
            "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
        },
        {
            "name": "ecs.capability.execution-role-awslogs"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
        },
        {
            "name": "com.amazonaws.ecs.capability.task-iam-role"
        },
        {
            "name": "ecs.capability.extensible-ephemeral-storage"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
        },
        {
            "name": "ecs.capability.task-eni"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.29"
        }
    ],
    "placementConstraints": [],
    "compatibilities": [
        "EC2",
        "FARGATE"
    ],
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "cpu": "2048",
    "memory": "4096",
    "ephemeralStorage": {
        "sizeInGiB": 21
    },
    "runtimePlatform": {
        "cpuArchitecture": "X86_64",
        "operatingSystemFamily": "LINUX"
    }
}

当您将 AMP 工作区用作 AMG 中的数据源时,您可以看到结果(在此处显示在 Explore 中):

无法使用AWS分发的Open Telemetry从我的Spring Boot应用程序中摄取指标。

英文:

I tried to replicate your setup but since you don't provide details about the container image you use, I had to come up with a similar setup that does what you want and works as one would expect.

Using the official ADOT image and the appropriate ECS configuration the following ECS task definition works (you will have to replace the xxxxxxxxxxx values with your own values such as account ID and AMP workspace ID):

{
"taskDefinitionArn": "arn:aws:ecs:eu-west-1:xxxxxxxxxxx:task-definition/adot:1",
"containerDefinitions": [
{
"name": "aws-otel-collector",
"image": "public.ecr.aws/aws-observability/aws-otel-collector:v0.29.1",
"essential": true,
"command": [
"--config=/etc/ecs/ecs-amp-prometheus.yaml"
],
"environment": [
{
"name": "AWS_PROMETHEUS_SCRAPING_ENDPOINT",
"value": "localhost:8765"
},
{
"name": "AWS_REGION",
"value": "eu-west-1"
},
{
"name": "AWS_PROMETHEUS_ENDPOINT",
"value": "https://aps-workspaces.eu-west-1.amazonaws.com/workspaces/xxxxxxxxxxx/api/v1/remote_write"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-create-group": "true",
"awslogs-group": "/ecs/ecs-aws-otel-sidecar-collector",
"awslogs-region": "eu-west-1",
"awslogs-stream-prefix": "ecs"
}
}
},
{
"name": "load-gen",
"image": "public.ecr.aws/h0h9t7p1/alpine-bash-curl-jq:latest",
"portMappings": [
{
"name": "load-gen-80-tcp",
"containerPort": 80,
"hostPort": 80,
"protocol": "tcp",
"appProtocol": "http"
}
],
"essential": true,
"command": [
"/bin/bash",
"-c",
"sleep 15; while : ; do curl -s -o /dev/null localhost:8765 ; sleep 1; done"
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-create-group": "true",
"awslogs-group": "/ecs/adot",
"awslogs-region": "eu-west-1",
"awslogs-stream-prefix": "ecs"
}
}
},
{
"name": "metrics-source",
"image": "public.ecr.aws/mhausenblas/ho11y:stable",
"cpu": 0,
"portMappings": [
{
"name": "metrics-source-8765-tcp",
"containerPort": 8765,
"hostPort": 8765,
"protocol": "tcp",
"appProtocol": "http"
}
],
"essential": true,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-create-group": "true",
"awslogs-group": "/ecs/adot",
"awslogs-region": "eu-west-1",
"awslogs-stream-prefix": "ecs"
}
}
}
],
"family": "adot",
"taskRoleArn": "arn:aws:iam::xxxxxxxxxxx:role/ecsTaskExecutionRole",
"executionRoleArn": "arn:aws:iam::xxxxxxxxxxx:role/ecsTaskExecutionRole",
"networkMode": "awsvpc",
"revision": 1,
"requiresAttributes": [
{
"name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
},
{
"name": "ecs.capability.execution-role-awslogs"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
},
{
"name": "com.amazonaws.ecs.capability.task-iam-role"
},
{
"name": "ecs.capability.extensible-ephemeral-storage"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
},
{
"name": "ecs.capability.task-eni"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.29"
}
],
"placementConstraints": [],
"compatibilities": [
"EC2",
"FARGATE"
],
"requiresCompatibilities": [
"FARGATE"
],
"cpu": "2048",
"memory": "4096",
"ephemeralStorage": {
"sizeInGiB": 21
},
"runtimePlatform": {
"cpuArchitecture": "X86_64",
"operatingSystemFamily": "LINUX"
}
}

When you then use the AMP workspace as a data source in AMG you see the result (here shown in Explore):

无法使用AWS分发的Open Telemetry从我的Spring Boot应用程序中摄取指标。

huangapple
  • 本文由 发表于 2023年5月25日 13:00:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76329070.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定