英文:
Custom latency metrics in Datadog
问题
我们在公司成功地使用DD进行追踪和监控。我们有一个微服务架构(使用Go编写的服务),流程如下:
进入请求 -> 代理服务 -> AWS EventBridge -> 服务1 -> 服务2
我们可以在Datadog上看到追踪信息,但我们希望能够测量请求到达服务2所需的时间,并在超过阈值时发出警报。当我们向DD支持团队求助时,他们表示他们目前不支持使用我们已有的追踪信息来测量延迟。
因此,我在考虑在服务2中发出自定义指标,测量请求到达代理服务和服务2之间的时间差。我无法找到任何关于如何实现这一点的指引。请帮忙提供相关帮助(最好使用Go语言)。
英文:
We have been using DD successfully for tracing and monitoring at our company. We have a microservices architecture (services written in Go) where the flow is something like this:
Incoming request -> Proxy service -> AWS EventBridge -> Service1 -> Service2
We can see the trace on Datadog but we want to be able to measure the time it takes for the request to make it to Service2 and alert if it is beyond a threshold. When we reached out to DD support, they said they don't yet support the ability to measure the latency with the traces we have.
So I was thinking I could emit a custom metric at Service2, measuring the time difference between when the request hits the Proxy Service and Service2. I can't find any pointers on how to do this anywhere. Any help in this regard (preferably in Go)?
答案1
得分: 2
我建议将时间信息(当前时间戳减去标头时间戳)作为直方图进行跟踪。默认情况下,它将平均值、中位数、最大值和95百分位值聚合为仪表盘,并将样本计数聚合为速率。
听起来你的延迟测量是Server2上的当前时间减去标头时间戳。使用datadog-go库,代码如下:
statsd.Histogram("latency.proxy_to_server2", float64(your_measurement), []string{"tag:value"}, 1)
Datadog的文档详细介绍了如何提交指标。
根据你的Datadog代理配置,你可能已经有了自动标记。如果没有,并且你运行了几个Service2任务/容器/实例,那么你可能希望在statsd
客户端初始化期间添加一些全局标记。
英文:
I'd recommend tracking timing information (current timestamp minus the header timestamp) as a histogram. By default it aggregates average, median, max, and 95th percentile values into gauges, and a count of samples into a rate.
Sounds like your latency measurement is the current time on Server2 minus the header timestamp. Using the datadog-go library it looks something like:
statsd.Histogram("latency.proxy_to_server2", float64(your_measurement), []string{"tag:value"}, 1)
Datadog's docs go into more detail about how to submit metrics.
Depending on your Datadog agent configuration, you may already have automatic tagging. If not, and you're running several Service2 tasks/pods/instances, then you might want to add some global tags during statsd
client initialization.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论