英文:
Prometheus doesn't have metrics from taskmanager if flink job started
问题
我在Kubernetes上运行Flink 1.15.2,并为Flink集群设置了以下指标配置:
```yaml
# 指标配置
metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
问题是,如果Flink作业已启动,Prometheus就无法获取来自TaskManager的指标。如果我停止作业,然后可以看到指标,但某些指标是空的。
- 我尝试减少CPU使用率,但仍然无法从TaskManager获取指标。
- 我尝试增加任务槽位,仍然没有指标。
- 这发生在Intel和ARM节点上都出现。
- 我尝试更改Flink配置如下,指标在一段时间内(几秒钟)被收集,然后再次消失。
# 指标配置
metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
- 我尝试更改Flink配置如下,但仍然没有指标。
kafkaSourceBuilder.setProperty("register.consumer.metrics", "false");
var producerProperties = new Properties();
producerProperties.setProperty("register.producer.metrics", "false");
producerSinkBuilder.setKafkaProducerConfig(producerProperties);
- 如果我尝试在Flink 1.15.3上启动作业,指标会被收集。
- 如果我尝试在Flink 1.16.0上启动作业,Prometheus根本没有来自Flink的任何指标。
<details>
<summary>英文:</summary>
I operate flink 1.15.2 on Kubernetes and set metric configuration for Flink Cluster as below
metrics
metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
The problem is that prometheus doesn't get metrics from taskmanager if the flink job has started.
If I stopped the job, then I could see the metrics however some metrics are empty.
1. I tried to reduce CPU usage but still no metric from taskmanager
2. I tried to increase task slot, still no metric
3. It happens to both Intel and ARM node
4. I tried to change flink config as below, metircs were collocted for a moment(several seconds) and disappeared again
metrics
metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
5. I tried to change flink config as below, but still no metric
kafkaSourceBuilder.setProperty("register.consumer.metrics", "false");
var producerProperties = new Properties();
producerProperties.setProperty("register.producer.metrics", "false");
producerSinkBuilder.setKafkaProducerConfig(producerProperties);
6. If I try to start job on flink 1.15.3, metircs were collocted
6. If I try to start job on flink 1.16.0, Prometheus doesn't have any metric from flink at all
</details>
# 答案1
**得分**: 2
如Flink 1.16的发布说明中所提到的,通过它们的类来配置报告者已被弃用。详细信息请参见https://nightlies.apache.org/flink/flink-docs-master/release-notes/flink-1.16/#flink-27206httpsissuesapacheorgjirabrowseflink-27206。
在1.16.0版本中还存在一些已知的指标报告问题;请升级到Flink 1.16.1。
<details>
<summary>英文:</summary>
As mentioned in the release notes of Flink 1.16, configuring reporters by their class has been deprecated. See https://nightlies.apache.org/flink/flink-docs-master/release-notes/flink-1.16/#flink-27206httpsissuesapacheorgjirabrowseflink-27206 for details.
There are also some known issues with metrics reporting in 1.16.0; please upgrade to Flink 1.16.1.
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论