英文:
Not able to ingest metrics from my spring boot app using AWS Distro for Open Telemetry
问题
我有一个部署在AWS ECS Fargate集群上的Spring Boot应用程序。作为一个sidecar容器,我部署了“adot-collector”容器,用于从Amazon Elastic Container Service(Amazon ECS)中抓取指标,并使用AWS Distro for Open Telemetry(ADOT)将它们导入Amazon Managed Service for Prometheus。
我的Spring应用程序在端口8080上公开了一个API "/actuator/prometheus",用于公开Java应用程序的指标,我希望ADOT抓取此API以获取指标。以下是我已经配置的ADOT收集器配置。
adot.config.yaml
receivers:
prometheus:
config:
global:
scrape_interval: 15s
scrape_timeout: 10s
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: [ 0.0.0.0:9090 ]
- job_name: "my-spring-app"
metrics_path: "actuator/prometheus"
static_configs:
- targets: [ 0.0.0.0:8080 ]
awsecscontainermetrics:
collection_interval: 10s
processors:
filter:
metrics:
include:
match_type: strict
metric_names:
- ecs.task.memory.utilized
- ecs.task.memory.reserved
- ecs.task.cpu.utilized
- ecs.task.cpu.reserved
- ecs.task.network.rate.rx
- ecs.task.network.rate.tx
- ecs.task.storage.read_bytes
- ecs.task.storage.write_bytes
exporters:
prometheusremotewrite:
endpoint: https://xxx/remote_write
auth:
authenticator: sigv4auth
logging:
loglevel: info
extensions:
health_check:
pprof:
endpoint: :1888
zpages:
endpoint: :55679
sigv4auth:
region: us-west-2
service: aps
assume_role:
arn:
sts_region: eu-west-2
service:
extensions: [pprof, zpages, health_check, sigv4auth]
telemetry:
logs:
level: debug
pipelines:
metrics:
receivers: [prometheus]
exporters: [logging, prometheusremotewrite]
metrics/ecs:
receivers: [awsecscontainermetrics]
processors: [filter]
exporters: [logging, prometheusremotewrite]
ECS任务定义如下
任务定义:
{
"family": "adot-prom",
"networkMode": "awsvpc",
"containerDefinitions": [
{
"name": "adot-collector",
"image": "account_id.dkr.ecr.region.amazonaws.com/image-tag",
"essential": true,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/ecs-adot-collector",
"awslogs-region": "my-region",
"awslogs-stream-prefix": "ecs",
"awslogs-create-group": "True"
}
}
},
{
"name": "prometheus",
"image": "prom/prometheus:main",
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/ecs-prom",
"awslogs-region": "my-region",
"awslogs-stream-prefix": "ecs",
"awslogs-create-group": "True"
}
}
},
{
"name": "my-spring-app",
"image": "ecr repo url",
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/ecs-app",
"awslogs-region": "my-region",
"awslogs-stream-prefix": "app",
"awslogs-create-group": "True"
},
"portMappings": [{
"containerPort": 8080
}]
}
}
],
"requiresCompatibilities": [
"FARGATE"
],
"cpu": "1024"
}
但不知道为什么每次Prometheus尝试从我的/actuator/prometheus端点抓取指标时,我都会收到以下错误,尽管端点存在。
错误:
debug scrape/scrape.go:1353 Scrape failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "prometheus", "target": "http://0.0.0.0:8080/actuator/prometheus", "error": "server returned HTTP status 404 Not Found", "errorVerbose": "server returned HTTP status 404 Not Found\ngithub.com/prometheus/prometheus/scrape.(*targetScraper).scrape\n\tgithub.com/prometheus/prometheus@v0.43.0/scrape/scrape.go:817\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport\n\tgithub.com/prometheus/prometheus@v0.43.0/scrape/scrape.go:1340\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).run\n\tgithub.com/prometheus/prometheus@v0.43.0/scrape/scrape.go:1264\nruntime.goexit\n\truntime/asm_amd64.s:1598"}
英文:
I have a spring boot application which is deployed on AWS ECS Fargate cluster. As a sidecar container i have deployed "adot-collector" container to scrape metrics from Amazon Elastic Container Service (Amazon ECS) and ingest them into Amazon Managed Service for Prometheus using AWS Distro for Open Telemetry (ADOT).
I have an API "/actuator/prometheus" exposed on port 8080 on my spring app which exposes my java pplication metrics and i want ADOT to scrape this API for metrics. Below is the adot collector config i have in place.
adot.config.yaml
receivers:
prometheus:
config:
global:
scrape_interval: 15s
scrape_timeout: 10s
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: [ 0.0.0.0:9090 ]
- job_name: "my-spring-app"
metrics_path: "actuator/prometheus"
static_configs:
- targets: [ 0.0.0.0:8080 ]
awsecscontainermetrics:
collection_interval: 10s
processors:
filter:
metrics:
include:
match_type: strict
metric_names:
- ecs.task.memory.utilized
- ecs.task.memory.reserved
- ecs.task.cpu.utilized
- ecs.task.cpu.reserved
- ecs.task.network.rate.rx
- ecs.task.network.rate.tx
- ecs.task.storage.read_bytes
- ecs.task.storage.write_bytes
exporters:
prometheusremotewrite:
endpoint: https://xxx/remote_write
auth:
authenticator: sigv4auth
logging:
loglevel: info
extensions:
health_check:
pprof:
endpoint: :1888
zpages:
endpoint: :55679
sigv4auth:
region: us-west-2
service: aps
assume_role:
arn:
sts_region: eu-west-2
service:
extensions: [pprof, zpages, health_check, sigv4auth]
telemetry:
logs:
level: debug
pipelines:
metrics:
receivers: [prometheus]
exporters: [logging, prometheusremotewrite]
metrics/ecs:
receivers: [awsecscontainermetrics]
processors: [filter]
exporters: [logging, prometheusremotewrite]
And ECS task definition looks like this
task definition:
{
"family": "adot-prom",
"networkMode": "awsvpc",
"containerDefinitions": [
{
"name": "adot-collector",
"image": "account_id.dkr.ecr.region.amazonaws.com/image-tag",
"essential": true,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/ecs-adot-collector",
"awslogs-region": "my-region",
"awslogs-stream-prefix": "ecs",
"awslogs-create-group": "True"
}
}
},
{
"name": "prometheus",
"image": "prom/prometheus:main",
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/ecs-prom",
"awslogs-region": "my-region",
"awslogs-stream-prefix": "ecs",
"awslogs-create-group": "True"
}
}
},
{
"name": "my-spring-app",
"image": "ecr repo url",
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/ecs-app",
"awslogs-region": "my-region",
"awslogs-stream-prefix": "app",
"awslogs-create-group": "True"
}
,
"portMappings": [{
"containerPort": 8080
}]
}
}
],
"requiresCompatibilities": [
"FARGATE"
],
"cpu": "1024"
}
But not sure why i always receive the following error while prometheus is trying to scrape the metrics from my /actuator/prometheus endpoint even though the endpoint exists.
Error:
debug scrape/scrape.go:1353 Scrape failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "prometheus", "target": "http://0.0.0.0:8080/actuator/prometheus", "error": "server returned HTTP status 404 Not Found", "errorVerbose": "server returned HTTP status 404 Not Found\ngithub.com/prometheus/prometheus/scrape.(*targetScraper).scrape\n\tgithub.com/prometheus/prometheus@v0.43.0/scrape/scrape.go:817\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport\n\tgithub.com/prometheus/prometheus@v0.43.0/scrape/scrape.go:1340\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).run\n\tgithub.com/prometheus/prometheus@v0.43.0/scrape/scrape.go:1264\nruntime.goexit\n\truntime/asm_amd64.s:1598"}
答案1
得分: 1
以下是翻译的内容:
我尝试复制您的设置,但由于您未提供使用的容器映像的详细信息,我不得不提供一个类似的设置,可以实现您想要的功能并按预期工作。
使用官方的 ADOT 映像和适当的 ECS 配置,以下 ECS 任务定义可用(您需要将 xxxxxxxxxxx
的值替换为您自己的值,例如帐户 ID 和 AMP 工作区 ID):
{
"taskDefinitionArn": "arn:aws:ecs:eu-west-1:xxxxxxxxxxx:task-definition/adot:1",
"containerDefinitions": [
{
"name": "aws-otel-collector",
"image": "public.ecr.aws/aws-observability/aws-otel-collector:v0.29.1",
"essential": true,
"command": [
"--config=/etc/ecs/ecs-amp-prometheus.yaml"
],
"environment": [
{
"name": "AWS_PROMETHEUS_SCRAPING_ENDPOINT",
"value": "localhost:8765"
},
{
"name": "AWS_REGION",
"value": "eu-west-1"
},
{
"name": "AWS_PROMETHEUS_ENDPOINT",
"value": "https://aps-workspaces.eu-west-1.amazonaws.com/workspaces/xxxxxxxxxxx/api/v1/remote_write"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-create-group": "true",
"awslogs-group": "/ecs/ecs-aws-otel-sidecar-collector",
"awslogs-region": "eu-west-1",
"awslogs-stream-prefix": "ecs"
}
}
},
{
"name": "load-gen",
"image": "public.ecr.aws/h0h9t7p1/alpine-bash-curl-jq:latest",
"portMappings": [
{
"name": "load-gen-80-tcp",
"containerPort": 80,
"hostPort": 80,
"protocol": "tcp",
"appProtocol": "http"
}
],
"essential": true,
"command": [
"/bin/bash",
"-c",
"sleep 15; while : ; do curl -s -o /dev/null localhost:8765 ; sleep 1; done"
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-create-group": "true",
"awslogs-group": "/ecs/adot",
"awslogs-region": "eu-west-1",
"awslogs-stream-prefix": "ecs"
}
}
},
{
"name": "metrics-source",
"image": "public.ecr.aws/mhausenblas/ho11y:stable",
"cpu": 0,
"portMappings": [
{
"name": "metrics-source-8765-tcp",
"containerPort": 8765,
"hostPort": 8765,
"protocol": "tcp",
"appProtocol": "http"
}
],
"essential": true,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-create-group": "true",
"awslogs-group": "/ecs/adot",
"awslogs-region": "eu-west-1",
"awslogs-stream-prefix": "ecs"
}
}
}
],
"family": "adot",
"taskRoleArn": "arn:aws:iam::xxxxxxxxxxx:role/ecsTaskExecutionRole",
"executionRoleArn": "arn:aws:iam::xxxxxxxxxxx:role/ecsTaskExecutionRole",
"networkMode": "awsvpc",
"revision": 1,
"requiresAttributes": [
{
"name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
},
{
"name": "ecs.capability.execution-role-awslogs"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
},
{
"name": "com.amazonaws.ecs.capability.task-iam-role"
},
{
"name": "ecs.capability.extensible-ephemeral-storage"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
},
{
"name": "ecs.capability.task-eni"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.29"
}
],
"placementConstraints": [],
"compatibilities": [
"EC2",
"FARGATE"
],
"requiresCompatibilities": [
"FARGATE"
],
"cpu": "2048",
"memory": "4096",
"ephemeralStorage": {
"sizeInGiB": 21
},
"runtimePlatform": {
"cpuArchitecture": "X86_64",
"operatingSystemFamily": "LINUX"
}
}
当您将 AMP 工作区用作 AMG 中的数据源时,您可以看到结果(在此处显示在 Explore 中):
英文:
I tried to replicate your setup but since you don't provide details about the container image you use, I had to come up with a similar setup that does what you want and works as one would expect.
Using the official ADOT image and the appropriate ECS configuration the following ECS task definition works (you will have to replace the xxxxxxxxxxx
values with your own values such as account ID and AMP workspace ID):
{
"taskDefinitionArn": "arn:aws:ecs:eu-west-1:xxxxxxxxxxx:task-definition/adot:1",
"containerDefinitions": [
{
"name": "aws-otel-collector",
"image": "public.ecr.aws/aws-observability/aws-otel-collector:v0.29.1",
"essential": true,
"command": [
"--config=/etc/ecs/ecs-amp-prometheus.yaml"
],
"environment": [
{
"name": "AWS_PROMETHEUS_SCRAPING_ENDPOINT",
"value": "localhost:8765"
},
{
"name": "AWS_REGION",
"value": "eu-west-1"
},
{
"name": "AWS_PROMETHEUS_ENDPOINT",
"value": "https://aps-workspaces.eu-west-1.amazonaws.com/workspaces/xxxxxxxxxxx/api/v1/remote_write"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-create-group": "true",
"awslogs-group": "/ecs/ecs-aws-otel-sidecar-collector",
"awslogs-region": "eu-west-1",
"awslogs-stream-prefix": "ecs"
}
}
},
{
"name": "load-gen",
"image": "public.ecr.aws/h0h9t7p1/alpine-bash-curl-jq:latest",
"portMappings": [
{
"name": "load-gen-80-tcp",
"containerPort": 80,
"hostPort": 80,
"protocol": "tcp",
"appProtocol": "http"
}
],
"essential": true,
"command": [
"/bin/bash",
"-c",
"sleep 15; while : ; do curl -s -o /dev/null localhost:8765 ; sleep 1; done"
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-create-group": "true",
"awslogs-group": "/ecs/adot",
"awslogs-region": "eu-west-1",
"awslogs-stream-prefix": "ecs"
}
}
},
{
"name": "metrics-source",
"image": "public.ecr.aws/mhausenblas/ho11y:stable",
"cpu": 0,
"portMappings": [
{
"name": "metrics-source-8765-tcp",
"containerPort": 8765,
"hostPort": 8765,
"protocol": "tcp",
"appProtocol": "http"
}
],
"essential": true,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-create-group": "true",
"awslogs-group": "/ecs/adot",
"awslogs-region": "eu-west-1",
"awslogs-stream-prefix": "ecs"
}
}
}
],
"family": "adot",
"taskRoleArn": "arn:aws:iam::xxxxxxxxxxx:role/ecsTaskExecutionRole",
"executionRoleArn": "arn:aws:iam::xxxxxxxxxxx:role/ecsTaskExecutionRole",
"networkMode": "awsvpc",
"revision": 1,
"requiresAttributes": [
{
"name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
},
{
"name": "ecs.capability.execution-role-awslogs"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
},
{
"name": "com.amazonaws.ecs.capability.task-iam-role"
},
{
"name": "ecs.capability.extensible-ephemeral-storage"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
},
{
"name": "ecs.capability.task-eni"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.29"
}
],
"placementConstraints": [],
"compatibilities": [
"EC2",
"FARGATE"
],
"requiresCompatibilities": [
"FARGATE"
],
"cpu": "2048",
"memory": "4096",
"ephemeralStorage": {
"sizeInGiB": 21
},
"runtimePlatform": {
"cpuArchitecture": "X86_64",
"operatingSystemFamily": "LINUX"
}
}
When you then use the AMP workspace as a data source in AMG you see the result (here shown in Explore):
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论