
huangapple go评论102阅读模式

When to use gauge or histogram in prometheus in recording request duration?








I'm new to metric monitoring.

If we want to record the duration of the requests, I think we should use gauge, but in practise, someone would use histogram.

for example, in grpc-ecosystem/go-grpc-prometheus, they prefer to use histogram to record duration. Are there agreed best practices for the use of metric types? Or it is just their own preference.

// ServerMetrics represents a collection of metrics to be registered on a
// Prometheus metrics registry for a gRPC server.
type ServerMetrics struct {
	serverStartedCounter          *prom.CounterVec
	serverHandledCounter          *prom.CounterVec
	serverStreamMsgReceived       *prom.CounterVec
	serverStreamMsgSent           *prom.CounterVec
	serverHandledHistogramEnabled bool
	serverHandledHistogramOpts    prom.HistogramOpts
	serverHandledHistogram        *prom.HistogramVec



得分: 4











  1. 如果需要聚合,请选择直方图。
  2. 否则,如果你对将要观察的值的范围和分布有一个概念,请选择直方图。如果你需要一个准确的分位数,无论值的范围和分布如何,请选择摘要。

或者像Adam Woodbeck在他的书《使用Go进行网络编程》中所说:



I am new to this but let me try to answer your question. So take my answer with a grain of salt or maybe someone with experience in using metrics to observe their systems jumps in.

as stated in https://prometheus.io/docs/concepts/metric_types/

> A gauge is a metric that represents a single numerical value that can arbitrarily go up and down.

So if your goal would be to display the current value (duration time of requests) you could use a gauge. But I think the goal of using metrics is to find problems within your system or generate alerts if and when certain vaules aren't in a predefined range or getting a performance value (like the Apdex score) for your system.

From https://prometheus.io/docs/concepts/metric_types/#histogram

>Use the histogram_quantile() function to calculate quantiles from histograms or even aggregations of histograms. A histogram is also suitable to calculate an Apdex score.

From https://en.wikipedia.org/wiki/Apdex

>Apdex (Application Performance Index) is an open standard developed by an alliance of companies for measuring performance of software applications in computing. Its purpose is to convert measurements into insights about user satisfaction, by specifying a uniform way to analyze and report on the degree to which measured performance meets user expectations.

Read up on Quantiles and the calculations in histograms and summaries https://prometheus.io/docs/practices/histograms/#quantiles

Two rules of thumb:

  1. If you need to aggregate, choose histograms.
  2. Otherwise, choose a histogram if you have an idea of the range and distribution of values that will be observed. Choose a summary if you need an accurate quantile, no matter what the range and distribution of the values is.

Or like Adam Woodbeck in his book "Network programming with Go" said:

>The general advice is to use summaries when you don’t know the range of expected values, but I’d advise you to use histograms whenever possible
so that you can aggregate histograms on the metrics server.


得分: 4


例如,如果频繁请求的端点测量请求持续时间,并且Prometheus被设置为每30秒抓取您的应用程序(例如,在scrape_configs中的scrape_interval: 30s),那么当持续时间存储在Gauge度量中时,Prometheus每30秒只会抓取最后一个请求的持续时间。所有先前的请求持续时间测量值都会丢失。



  • 您需要选择直方图桶的数量和边界,以便对测量度量的分布提供良好的准确性。这并不是一项简单的任务,因为您可能事先不知道度量的真实分布。
  • 如果某个测量的桶数量或其边界发生更改,则histogram_quantile()函数在该测量上返回无效结果。
  • 每个直方图中的太多桶可能会导致高基数问题,因为直方图中的每个桶都会创建一个单独的时间序列



The main difference between gauge and histogram metric types in Prometheus is that Prometheus captures only a single (last) value of the gauge metric when it scrapes the target exposing the metric, while histogram captures all the metric values by incrementing the corresponding histogram bucket.

For example, if request duration is measured for frequently requested endpoint and Prometheus is set up to scrape your app every 30 seconds (e.g. scrape_interval: 30s in scrape_configs), then the Prometheus will scrape only a single duration for the last request every 30 seconds when the duration is stored in a Gauge metric. All the previous measurements for the request duration are lost.

On the other hand, any number of request duration measurement are registered in Histogram metric, and this doesn't depend on the interval between scrapes of your app. Later the Histogram metric allows obtaining the distribution of request durations on an arbitrary time range.

Prometheus histograms have some issues though:

  • You need to choose the number and the boundaries of histogram buckets, so they provide good accuracy for observing the distribution of the measured metric. This isn't a trivial task, since you may not know in advance the real distribution of the metric.
  • If the number of buckets are changed or their boundaries are changed for some measurement, then the histogram_quantile() function returns invalid results over such a measurement.
  • Too big number of buckets per each histogram may result in high cardinality issues, since each bucket in the histogram creates a separate time series.

P.S. these issues are addressed in VcitoriaMetrics histograms (I'm the core developer of VictoriaMetrics).


得分: 1





As valyala suggest, the main difference is that histogram aggregates data, so you would take advantage of prometheus statistics engine over all registered samples (minimum, maximum, average, quantiles, etc.).

A gauge is more used to measure for example "wind velocity", "queue size", or any other kind of "instant data" where it is not so important to ignore old related samples of it as you want to know current picture.

Using gauges for "duration of the requests" would require very small scrape periods to be accurate, which is not practical even if your rate is not very high (if your scrape period is less than your application reception rate, you will ignore data). So, in summary, don't use gauges. Histogram fits much better your needs.

  • 本文由 发表于 2022年4月6日 22:21:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/71768510.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
