无法将 Flink 指标暴露给 Prometheus

huangapple go评论76阅读模式
英文:

Can't expose Flink metrics to Prometheus

问题

我正在尝试将Flink的内置指标暴露给Prometheus,但一些原因,Prometheus无法识别这些目标 - 包括JMXPrometheusReporter

prometheus.yml中定义的抓取配置如下:

scrape_configs:
  - job_name: node
    static_configs:
      - targets: ['localhost:9100']

  - job_name: 'kafka-server'
    static_configs:
      - targets: ['localhost:7071']

  - job_name: 'flink-jmx'
    static_configs:
      - targets: ['localhost:8789']

  - job_name: 'flink-prom'
    static_configs:
      - targets: ['localhost:9249']

而我的flink-conf.yml包含以下行:

#metrics.reporters: jmx, prom
metrics.reporters: jmx, prometheus

#metrics.reporter.jmx.factory.class: org.apache.flink.metrics.jmx.JMXReporterFactory
metrics.reporter.jmx.class: org.apache.flink.metrics.jmx.JMXReporter
metrics.reporter.jmx.port: 8789

metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9249

然而,当运行WordCount时,无论是在IntelliJ中,还是作为jar运行:java -jar target/flink-word-count.jar --input src/main/resources/loremipsum.txt,或作为Flink任务运行:flink run target/flink-word-count.jar --input src/main/resources/loremipsum.txt,都会导致Flink的两个目标都处于离线状态。

根据Flink文档,对于JMX,我不需要任何额外的依赖项,而对于Prometheus报告器,我已经在flink/lib/中复制了提供的flink-metrics-prometheus-1.10.0.jar

我做错了什么?缺少什么?

英文:

I'm trying to expose the built-in metrics of Flink to Prometheus, but somehow Prometheus doesn't recognize the targets - both the JMX as well as the PrometheusReporter.

The scraping defined in prometheus.yml looks like this:

scrape_configs:
  - job_name: node
    static_configs:
      - targets: ['localhost:9100']

  - job_name: 'kafka-server'
    static_configs:
      - targets: ['localhost:7071']

  - job_name: 'flink-jmx'
    static_configs:
      - targets: ['localhost:8789']

  - job_name: 'flink-prom'
    static_configs:
      - targets: ['localhost:9249']

And my flink-conf.yml has the following lines:

#metrics.reporters: jmx, prom
metrics.reporters: jmx, prometheus

#metrics.reporter.jmx.factory.class: org.apache.flink.metrics.jmx.JMXReporterFactory
metrics.reporter.jmx.class: org.apache.flink.metrics.jmx.JMXReporter
metrics.reporter.jmx.port: 8789

metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9249

However, both Flink targets are down when running a WordCount

  • in IntelliJ
  • as jar: java -jar target/flink-word-count.jar --input src/main/resources/loremipsum.txt
  • as Flink job: flink run target/flink-word-count.jar --input src/main/resources/loremipsum.txt

According to the Flink docs I don't need any additional dependencies for JMX and a copy of the provided flink-metrics-prometheus-1.10.0.jar in flink/lib/ for the Prometheus reporter.

What am I doing wrong? What is missing?

答案1

得分: 1

那个特定的任务会很快完成,我相信。一旦您将设置工作正常运行,可能就没有什么有趣的指标,因为任务的运行时间不足以显示任何内容。

当您使用迷你集群运行(如 java -jar ... ),flink-conf.yaml 文件不会被加载(除非您的任务已经进行了一些特殊设置以使其被加载)。同时请注意,这个文件通常具有 .yaml 扩展名;我不确定如果使用 .yml 会起作用。

您可以检查作业管理器和任务管理器的日志,以确保报告器被加载。

顺便说一下,上次我这样做时,我使用了以下设置,以便从多个进程中进行抓取:

# flink-conf.yaml

metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9250-9260
# prometheus.yml

scrape_configs:
  - job_name: 'flink'
    static_configs:
      - targets: ['localhost:9250', 'localhost:9251']
英文:

That particular job is going to run to completion pretty quickly, I believe. Once you get the setup working there may be no interesting metrics because the job doesn't run long enough for anything to show up.

When you run with a mini-cluster (as java -jar ...), the flink-conf.yaml file isn't loaded (unless you've done something rather special in your job to get it loaded). Note also that this file is normally has a .yaml extension; I'm not sure if it works if .yml is used instead.

You can check the jog manager and task manager logs to make sure that the reporters are being loaded.

FWIW, the last time I did this I used this setup, so that I could scrape from multiple processes:

# flink-conf.yaml

metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9250-9260
# prometheus.yml

scrape_configs:
  - job_name: 'flink'
    static_configs:
      - targets: ['localhost:9250', 'localhost:9251']

huangapple
  • 本文由 发表于 2020年9月17日 16:31:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/63934199.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定