英文:
Can't expose Flink metrics to Prometheus
问题
我正在尝试将Flink的内置指标暴露给Prometheus,但一些原因,Prometheus无法识别这些目标 - 包括JMX和PrometheusReporter。
在prometheus.yml
中定义的抓取配置如下:
scrape_configs:
- job_name: node
static_configs:
- targets: ['localhost:9100']
- job_name: 'kafka-server'
static_configs:
- targets: ['localhost:7071']
- job_name: 'flink-jmx'
static_configs:
- targets: ['localhost:8789']
- job_name: 'flink-prom'
static_configs:
- targets: ['localhost:9249']
而我的flink-conf.yml
包含以下行:
#metrics.reporters: jmx, prom
metrics.reporters: jmx, prometheus
#metrics.reporter.jmx.factory.class: org.apache.flink.metrics.jmx.JMXReporterFactory
metrics.reporter.jmx.class: org.apache.flink.metrics.jmx.JMXReporter
metrics.reporter.jmx.port: 8789
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9249
然而,当运行WordCount时,无论是在IntelliJ中,还是作为jar运行:java -jar target/flink-word-count.jar --input src/main/resources/loremipsum.txt
,或作为Flink任务运行:flink run target/flink-word-count.jar --input src/main/resources/loremipsum.txt
,都会导致Flink的两个目标都处于离线状态。
根据Flink文档,对于JMX,我不需要任何额外的依赖项,而对于Prometheus报告器,我已经在flink/lib/
中复制了提供的flink-metrics-prometheus-1.10.0.jar
。
我做错了什么?缺少什么?
英文:
I'm trying to expose the built-in metrics of Flink to Prometheus, but somehow Prometheus doesn't recognize the targets - both the JMX as well as the PrometheusReporter.
The scraping defined in prometheus.yml
looks like this:
scrape_configs:
- job_name: node
static_configs:
- targets: ['localhost:9100']
- job_name: 'kafka-server'
static_configs:
- targets: ['localhost:7071']
- job_name: 'flink-jmx'
static_configs:
- targets: ['localhost:8789']
- job_name: 'flink-prom'
static_configs:
- targets: ['localhost:9249']
And my flink-conf.yml
has the following lines:
#metrics.reporters: jmx, prom
metrics.reporters: jmx, prometheus
#metrics.reporter.jmx.factory.class: org.apache.flink.metrics.jmx.JMXReporterFactory
metrics.reporter.jmx.class: org.apache.flink.metrics.jmx.JMXReporter
metrics.reporter.jmx.port: 8789
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9249
However, both Flink targets are down when running a WordCount
- in IntelliJ
- as jar:
java -jar target/flink-word-count.jar --input src/main/resources/loremipsum.txt
- as Flink job:
flink run target/flink-word-count.jar --input src/main/resources/loremipsum.txt
According to the Flink docs I don't need any additional dependencies for JMX and a copy of the provided flink-metrics-prometheus-1.10.0.jar
in flink/lib/
for the Prometheus reporter.
What am I doing wrong? What is missing?
答案1
得分: 1
那个特定的任务会很快完成,我相信。一旦您将设置工作正常运行,可能就没有什么有趣的指标,因为任务的运行时间不足以显示任何内容。
当您使用迷你集群运行(如 java -jar ...
),flink-conf.yaml
文件不会被加载(除非您的任务已经进行了一些特殊设置以使其被加载)。同时请注意,这个文件通常具有 .yaml
扩展名;我不确定如果使用 .yml
会起作用。
您可以检查作业管理器和任务管理器的日志,以确保报告器被加载。
顺便说一下,上次我这样做时,我使用了以下设置,以便从多个进程中进行抓取:
# flink-conf.yaml
metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9250-9260
# prometheus.yml
scrape_configs:
- job_name: 'flink'
static_configs:
- targets: ['localhost:9250', 'localhost:9251']
英文:
That particular job is going to run to completion pretty quickly, I believe. Once you get the setup working there may be no interesting metrics because the job doesn't run long enough for anything to show up.
When you run with a mini-cluster (as java -jar ...
), the flink-conf.yaml
file isn't loaded (unless you've done something rather special in your job to get it loaded). Note also that this file is normally has a .yaml
extension; I'm not sure if it works if .yml
is used instead.
You can check the jog manager and task manager logs to make sure that the reporters are being loaded.
FWIW, the last time I did this I used this setup, so that I could scrape from multiple processes:
# flink-conf.yaml
metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9250-9260
# prometheus.yml
scrape_configs:
- job_name: 'flink'
static_configs:
- targets: ['localhost:9250', 'localhost:9251']
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论