监控批处理作业由Prometheus。

huangapple go评论68阅读模式
英文:

Monitor batch job by Prometheus

问题

file_push_status (文件推送状态:成功或失败)
first_test_status (第一个测试状态:通过或失败)
second_test_status (第二个测试状态:通过或失败)
first_test_time_taken (第一个测试所需时间:多长时间)
second_test_time_taken (第二个测试所需时间:多长时间)

在Prometheus文档中查阅,但无法确定应该使用摘要(Summary)还是直方图(Histogram)。我了解Prometheus不支持布尔值(前三个情况),应该如何处理?

如有需要,可以附上现有的批处理作业代码。谢谢。

英文:

There is a Python batch job that pushes huge file(s) to a shared location, once the file(s) are pushed, couple of tests will be run against that/those file(s).
I'm trying to get some metrics around the batch job & planning to use Node exporter having below metrics or labels.

file_push_status (success or failure)
first_test_status (Pass or Fail)
second_test_status (Pass or Fail)
first_test_time_taken (How long)
second_test_time_taken (How long)

Gone thru prometheus documentation, but unable to get a clarity whether Summary or Histogram should be used here ? I understand, Prometheus doesnt support Boolean(1st 3 cases), how those should be handled ?

If needed will attach the existing batch job code, thank you.

答案1

得分: 1

以下是要翻译的内容:

"For small number of files you don't need histograms.

Make all three metrics gauges.

Something like

# HELP file_push_success A metric with 0/1 value showing result of file push job. 0 - failure.
# TYPE file_push_success gauge
file_push_success{file="filename.txt"} 1

# HELP file_push_test_success A metric with 0/1 value showing result of corresponding test after file being pushed. 0 - failure.
# TYPE file_push_test_success gauge
file_push_test_success{file="filename.txt", test="1"} 1
file_push_test_success{file="filename.txt", test="2"} 0

# HELP file_push_test_duration_seconds Duration of corresponding test after file being pushed
# TYPE file_push_test_duration_seconds gauge
file_push_test_duration_seconds{file="filename.txt", test="1"} 5
file_push_test_duration_seconds{file="filename.txt", test="2"} 13

Here I grouped related metrics into one with different labels. It would be more easier to support (for example when you'll decide to add new tests), and is generally advised by Prometheus documentation."

英文:

For small number of files you don't need histograms.

Make all three metrics gauges.

Something like

# HELP file_push_success A metric with 0/1 value showing result of file push job. 0 - failure.
# TYPE file_push_success gauge
file_push_success{file="filename.txt"} 1 

# HELP file_push_test_success A metric with 0/1 value showing result of corresponding test after file being pushed. 0 - failure.
# TYPE file_push_test_success gauge
file_push_test_success{file="filename.txt", test="1"} 1
file_push_test_success{file="filename.txt", test="2"} 0

# HELP file_push_test_duration_seconds Duration of corresponding test after file being pushed 
# TYPE file_push_test_duration_seconds gauge
file_push_test_duration_seconds{file="filename.txt", test="1"} 5
file_push_test_duration_seconds{file="filename.txt", test="2"} 13

Here I grouped related metrics into one with different labels. It would be more easier to support (for example when you'll decide to add new tests), and is generally advised by Prometheus documentation.

huangapple
  • 本文由 发表于 2023年4月4日 13:18:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/75925741.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定