英文:
query-exporter in Docker container not working
问题
我正在尝试在Docker容器中运行query-exporter。在开发人员的建议下,我已经在docker中启用了IPv6,方法是将以下内容添加到我的docker daemon.json文件中并重新启动:
{
"experimental": true,
"ip6tables": true
}
我正在使用以下docker-compose文件:
version: "3.3"
services:
prometheus:
container_name: prometheus
image: prom/prometheus
restart: always
volumes:
- ./prometheus:/etc/prometheus/
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
ports:
- 9090:9090
networks:
- prom_app_net
# 其他服务...
volumes:
# 定义卷...
networks:
prom_app_net:
slurm:
enable_ipv6: true
ipam:
config:
- subnet: 2001:0DB8::/112
然后在slurmctld容器上安装了query-exporter,并使用以下config.yaml文件运行它:
databases:
db1:
dsn: sqlite:////test.db
connect-sql:
- PRAGMA application_id = 123
- PRAGMA auto_vacuum = 1
labels:
region: us1
app: app1
metrics:
metric1:
type: gauge
description: A sample gauge
queries:
query1:
interval: 5
databases: [db1]
metrics: [metric1]
sql: SELECT random() / 1000000000000000 AS metric1
但是它没有工作 - Prometheus将目标列为离线。但容器设置似乎没问题,因为如果我运行以下测试导出器:
from prometheus_client import start_http_server, Summary
import random
import time
# 创建一个用于跟踪时间和请求的指标。
REQUEST_TIME = Summary('request_processing_seconds', 'Time spent processing request')
# 使用指标装饰函数。
@REQUEST_TIME.time()
def process_request(t):
"""一个需要一些时间的虚拟函数。"""
time.sleep(t)
if __name__ == '__main__':
# 启动服务器以公开指标。
start_http_server(8082)
# 生成一些请求。
while True:
process_request(random.random())
Prometheus可以成功连接到目标。
根据更新信息,问题可能是query-exporter尝试绑定IPv6,而测试的test_query.py使用IPv4在端口8082上。您可以尝试将Prometheus指向8082/tcp -> [::]:8082
,但如何实现取决于您的具体配置。
英文:
I am trying to get query-exporter to run in a Docker container. With advice from the developer I have enabled IPv6 in docker by putting:
{
"experimental": true,
"ip6tables": true
}
in my docker daemon.json and restarted.
I am using the following docker-compose file:
version: "3.3"
services:
prometheus:
container_name: prometheus
image: prom/prometheus
restart: always
volumes:
- ./prometheus:/etc/prometheus/
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
ports:
- 9090:9090
networks:
- prom_app_net
grafana:
container_name: grafana
image: grafana/grafana
user: '472'
restart: always
environment:
GF_INSTALL_PLUGINS: 'grafana-clock-panel,grafana-simple-json-datasource'
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/provisioning/:/etc/grafana/provisioning/
- './grafana/grafana.ini:/etc/grafana/grafana.ini'
env_file:
- ./grafana/.env_grafana
ports:
- 3000:3000
depends_on:
- prometheus
networks:
- prom_app_net
mysql:
image: mariadb:10.10
hostname: mysql
container_name: mysql
environment:
MYSQL_RANDOM_ROOT_PASSWORD: "yes"
MYSQL_DATABASE: slurm_acct_db
MYSQL_USER: slurm
MYSQL_PASSWORD: password
volumes:
- var_lib_mysql:/var/lib/mysql
networks:
- slurm
# network_mode: host
slurmdbd:
image: prom-slurm-cluster:${IMAGE_TAG:-21.08.6}
build:
context: .
args:
SLURM_TAG: ${SLURM_TAG:-slurm-21-08-6-1}
command: ["slurmdbd"]
container_name: slurmdbd
hostname: slurmdbd
volumes:
- etc_munge:/etc/munge
- etc_slurm:/etc/slurm
- var_log_slurm:/var/log/slurm
- cgroups:/sys/fs/cgroup:ro
expose:
- "6819"
ports:
- "6819:6819"
depends_on:
- mysql
privileged: true
cgroup: host
networks:
- slurm
#network_mode: host
slurmctld:
image: prom-slurm-cluster:${IMAGE_TAG:-21.08.6}
command: ["slurmctld"]
container_name: slurmctld
hostname: slurmctld
volumes:
- etc_munge:/etc/munge
- etc_slurm:/etc/slurm
- slurm_jobdir:/data
- var_log_slurm:/var/log/slurm
- etc_prometheus:/etc/prometheus
- /sys/fs/cgroup:/sys/fs/cgroup:rw
expose:
- "6817"
- "8080"
- "8081"
- "8082/tcp"
ports:
- 8080:8080
- 8081:8081
- 8082:8082/tcp
depends_on:
- "slurmdbd"
privileged: true
cgroup: host
#network_mode: host
networks:
- slurm
c1:
image: prom-slurm-cluster:${IMAGE_TAG:-21.08.6}
command: ["slurmd"]
hostname: c1
container_name: c1
volumes:
- etc_munge:/etc/munge
- etc_slurm:/etc/slurm
- slurm_jobdir:/data
- var_log_slurm:/var/log/slurm
- cgroups:/sys/fs/cgroup:ro
expose:
- "6818"
depends_on:
- "slurmctld"
privileged: true
cgroup: host
#network_mode: host
networks:
- slurm
c2:
image: prom-slurm-cluster:${IMAGE_TAG:-21.08.6}
command: ["slurmd"]
hostname: c2
container_name: c2
volumes:
- etc_munge:/etc/munge
- etc_slurm:/etc/slurm
- slurm_jobdir:/data
- var_log_slurm:/var/log/slurm
- cgroups:/sys/fs/cgroup:ro
expose:
- "6818"
- "22"
depends_on:
- "slurmctld"
privileged: true
cgroup: host
networks:
- slurm
#network_mode: host
volumes:
etc_munge:
etc_slurm:
slurm_jobdir:
var_lib_mysql:
var_log_slurm:
grafana_data:
prometheus_data:
cgroups:
etc_prometheus:
networks:
prom_app_net:
slurm:
enable_ipv6: true
ipam:
config:
- subnet: 2001:0DB8::/112
Then installed query-exporter on the slurmctld container and run it with the following config.yaml:
databases:
db1:
dsn: sqlite:////test.db
connect-sql:
- PRAGMA application_id = 123
- PRAGMA auto_vacuum = 1
labels:
region: us1
app: app1
metrics:
metric1:
type: gauge
description: A sample gauge
queries:
query1:
interval: 5
databases: [db1]
metrics: [metric1]
sql: SELECT random() / 1000000000000000 AS metric1
But it is not working - prometheus lists the target as being down:
But the container set-up seems to be fine as if I run the following test exporter:
from prometheus_client import start_http_server, Summary
import random
import time
# Create a metric to track time spent and requests made.
REQUEST_TIME = Summary('request_processing_seconds', 'Time spent processing request')
# Decorate function with metric.
@REQUEST_TIME.time()
def process_request(t):
"""A dummy function that takes some time."""
time.sleep(t)
if __name__ == '__main__':
# Start up the server to expose the metrics.
start_http_server(8082)
# Generate some requests.
while True:
process_request(random.random())
Prometheus can connect to the target fine:
Can anyone see what the problem could be?
Thanks!
Update
I run query-exporter by hand on the slurmctld container. There isnt anything in the container logs about query-exporter:
2023-07-10 10:11:37 ---> Starting the MUNGE Authentication service (munged) ...
2023-07-10 10:11:37 ---> Waiting for slurmdbd to become active before starting slurmctld ...
2023-07-10 10:11:37 -- slurmdbd is not available. Sleeping ...
2023-07-10 10:11:39 -- slurmdbd is now active ...
2023-07-10 10:11:39 ---> starting systemd ...
I think th etest_query.py that works is using IPv4 on port 8082, while the query exporter is trying to bind IPv6.
docker port slurmctld
gives:
8080/tcp -> 0.0.0.0:8080
8080/tcp -> [::]:8080
8081/tcp -> 0.0.0.0:8081
8081/tcp -> [::]:8081
8082/tcp -> 0.0.0.0:8082
8082/tcp -> [::]:8082
I guess i need to pint prometheus at 8082/tcp -> [::]:8082
when the query-exporter runs, but I'm not sure how to do it.
答案1
得分: 0
使用 query-exporter config.yaml -H 0.0.0.0 -p 8082
运行可以使其工作。
英文:
Running with query-exporter config.yaml -H 0.0.0.0 -p 8082
gets it to work.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论