英文:
Obtaining CPU Usage with Prometheus and Cadvisor in Grafana
问题
我对PromQL语言还相当新手,所以我遇到了一个问题,我正在尝试在"时间序列"图中获取每个容器的CPU使用情况,但我无法弄清楚如何除以总核心数(我更喜欢将CPU利用率显示为最大100%的比例)。这是我尝试使用的查询:
sum(rate(container_cpu_usage_seconds_total{name=~".+"}[$__rate_interval]) / sum(machine_cpu_cores)) by (name)
这不起作用。我以为因为"sum(machine_cpu_cores)"只是返回总核心数的总和(在我的情况下是8),我可以除以它,但我想这不是这种情况。相反,我将它去掉,手动替换为如下所示的数字8:
sum(rate(container_cpu_usage_seconds_total{name=~".+"}[$__rate_interval]) / 8) by (name)
手动输入"8"来表示核心数使这个工作,但我想使用一个更接近第一个示例的查询,它返回核心数,而不是必须输入数字。有什么办法可以让它起作用吗?
英文:
I'm pretty new to the PromQL language, and so I'm running into an issue where I'm trying to obtain CPU usage per container in a "Time series" chart, but I can't figure out how to divide by the number of total cores (I prefer to view CPU utilization to a maximum scale of 100%). Here's the query I'm attempting to use:
sum(rate(container_cpu_usage_seconds_total{name=~".+"}[$__rate_interval]) / sum(machine_cpu_cores)) by (name)
This doesn't work. I thought that since "sum(machine_cpu_cores)" is simply returning the sum of total cores (in my case 8), that I could divide by that, but I guess this isn't the case. Instead, I took that out and manually substituted the number 8 as shown below:
sum(rate(container_cpu_usage_seconds_total{name=~".+"}[$__rate_interval]) / 8) by (name)
Manually putting in "8" to represent the number of cores makes this work, but I wanted to use a query closer to the first example that returns the number of cores - instead of having to input the number. Is there something I can do to make that work?
答案1
得分: 1
问题很可能出在你的除法操作上。
rate(container_cpu_usage_seconds_total{name=~".+"}[$__rate_interval])
返回的向量与指标 container_cpu_usage_seconds_total
中的标签相同,而 sum(machine_cpu_cores)
返回的向量没有标签。
在将向量除以向量时,Prometheus会匹配具有相同标签的值并返回结果。由于在你的参数中没有实际的配对,所以返回了空结果。
要纠正这种行为,你有两种方式:
向量匹配
使用 on() group_left()
。
on()
提供了要用于匹配的标签列表。在我们的情况下,列表是空的,因此左侧的一切都与右侧的一切匹配。但由于左侧有多个值,你需要指定一对多匹配的行为。
group_left()
表示对于每个左侧参数,取一个正确的右侧参数并在操作中使用它。
生成的查询将如下所示:
sum by (name) (
rate(container_cpu_usage_seconds_total{name=~".+"}[$__rate_interval])
/ on() group_left() sum(machine_cpu_cores)
)
转换为标量
由于你的除数始终是一个单一的值,你可以使用函数 scalar()
将其转换为标量,跳过所有与标签匹配相关的麻烦。
生成的查询将如下所示:
sum by (name) (
rate(container_cpu_usage_seconds_total{name=~".+"}[$__rate_interval])
/ scalar(sum(machine_cpu_cores))
)
请注意,此解决方案仅在其中一个操作数保证具有单一值的情况下才可用,并且可能在支持方面不是最佳选择(如果以后决定向结果集添加更多维度,则需要重写查询)。
英文:
As you probably guessed problem lies with your division operation.
rate(container_cpu_usage_seconds_total{name=~".+"}[$__rate_interval])
returns vector with same labels as present in metric container_cpu_usage_seconds_total
, sum(machine_cpu_cores)
returns vector with no labels.
While dividing vector on vector, Prometheus matches values with same labels and returns result. Since there is no actual pairs in your arguments in returns empty result.
To correct this behavior you have two ways:
Vector matching
Use on() group_left()
.
on()
supplies list of labels to be used for matching. In our case list is empty, so everything from the left matches everything from the right. But since LHS has more then one value, you need to specify behavior of many-to-one matching.
group_left()
says for every LHS argument take one correct RHS argument and use it in operation.
Resulting query will look like this:
sum by (name) (
rate(container_cpu_usage_seconds_total{name=~".+"}[$__rate_interval])
/ on() group_left() sum(machine_cpu_cores)
)
Converting to scalar
Since you divisor is always a single value, you can convert it to scalar with function scalar()
and skip all the hustle with label matching.
Resulting query will look like this:
sum by (name) (
rate(container_cpu_usage_seconds_total{name=~".+"}[$__rate_interval])
/ scalar(sum(machine_cpu_cores))
)
Note this this solution is only available in cases where one of operands is guaranteed to have a single value, and might be not the greatest in terms of support (if you'll decide later to add more dimensions to the result set, it will require to rewrite query)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论