英文:
Flux aggregate function to calculate the difference from last to first
问题
我正在尝试学习InfluxDB的Flux查询语言。我正在使用InfluxDB OSS 2.7。
我有一个时间序列,记录了我的电表的用电量。它报告的是一个不断增加的数字,单位是千瓦时(KWh),我想显示每天我使用了多少瓦时(Wh),使用了一个带有aggregateWindow
的自定义函数。以下是我尝试过的内容:
myFunc = (tables=<-, column) => {
a = tables
|> first(column: column)
|> findRecord(fn: (key) => true, idx: 0)
b = tables
|> last(column: column)
|> findRecord(fn: (key) => true, idx: 0)
d = b._value - a._value
return tables
|> first()
|> map(fn: (r) => ({ r with _value: d}))
}
from(bucket: "a")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "el")
|> filter(fn: (r) => r["_field"] == "ACTIVE_IMPORT")
|> aggregateWindow(every: 1d, fn: myFunc, createEmpty: false)
|> yield(name: "Wh")
但是这返回了一个新的表,其中所有的_value
都是相同的数字(在我的情况下是134)。
我原本希望变量a
和b
会有每个_window_的第一个和最后一个值,并且d
会表示每个窗口中的用电量 - 但是事实似乎并非如此。
英文:
I'm trying to learn the flux query language for InfluxDB. I'm using InfluxDB OSS 2.7.
I have a time-series with power usage from my power meter. It reports an ever increasing number in KWh, and I want to show how many Wh I have used per day, by using a custom function with aggregateWindow
. Here is what I have tried:
myFunc = (tables=<-, column) => {
a = tables
|> first(column: column)
|> findRecord(fn: (key) => true, idx: 0)
b = tables
|> last(column: column)
|> findRecord(fn: (key) => true, idx: 0)
d = b._value - a._value
return tables
|> first()
|> map(fn: (r) => ({ r with _value: d}))
}
from(bucket: "a")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "el")
|> filter(fn: (r) => r["_field"] == "ACTIVE_IMPORT")
|> aggregateWindow(every: 1d, fn: myFunc, createEmpty: false)
|> yield(name: "Wh")
But this returns a new table, where all _value
have the same number (in my case 134).
I was hoping that the variables a
and b
would have the first and the last value of each window, and that d
would represent the usage in each window - but this does not seem to be the case.
答案1
得分: 2
如果您想要为每个窗口查找第一个和最后一个值之间的差异,可以使用spread
函数。
要确切,spread
计算的是最小值和最大值之间的差异(而不是第一个和最后一个),但在一个始终递增的系列中,这两者是相同的。
然而,这并不精确。如果考虑以下数据:
时间戳 | 值 |
---|---|
2023-04-12T00:00:00Z | 100 |
2023-04-12T01:00:00Z | 101 |
2023-04-12T02:00:00Z | 102 |
2023-04-12T03:00:00Z | 103 |
2023-04-12T04:00:00Z | 104 |
2023-04-12T05:00:00Z | 105 |
...(以下省略) |
然后,如果您在每一天内获取差异,将会得到:
from(bucket: "a")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "el")
|> filter(fn: (r) => r["_field"] == "ACTIVE_IMPORT")
|> aggregateWindow(every: 1d, fn: spread, createEmpty: false)
|> yield(name: "Wh")
日期 | 第一个 | 最后一个 | 差异 |
---|---|---|---|
2023-04-13T00:00:00Z | 100 | 123 | 23 |
2023-04-13T00:00:00Z | 124 | 147 | 23 |
然而,每天的实际消耗量是24。使用spread
会错过每天的一个间隔(从123到124的消耗量未被计算)。当然,如果您的数据粒度更高(例如:每分钟或每秒),那么缺失的值将不那么重要。
为了解决这个问题,我建议只获取每天的单个值(最后一个),然后使用difference
函数。这将执行“滚动差异”,因此从每个值中减去前一个值,并将为您提供更好的结果(所有小时都将得到计算)。
from(bucket: "a")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "el")
|> filter(fn: (r) => r["_field"] == "ACTIVE_IMPORT")
|> aggregateWindow(every: 1d, fn: last, createEmpty: false)
|> difference()
|> yield(name: "Wh")
使用last
进行聚合将得到:
日期 | 最后一个 |
---|---|
2023-04-13T00:00:00Z | 123 |
2023-04-13T00:00:00Z | 147 |
然后应用difference
将得到:
日期 | 差异(last) |
---|---|
2023-04-13T00:00:00Z | |
2023-04-13T00:00:00Z | 24 |
注意:第一个值将为null,因为它没有任何内容用于执行
difference
。
英文:
If you want to find for each window the difference between the first and last values you can use the spread
function.
To be precise spread
calculates the difference between the minimum and maximum values (not first and last) but in an always-increasing series the two are the same.
This though is not precise. If you consider the following data:
timestamp | value |
---|---|
2023-04-12T00:00:00Z | 100 |
2023-04-12T01:00:00Z | 101 |
2023-04-12T02:00:00Z | 102 |
2023-04-12T03:00:00Z | 103 |
2023-04-12T04:00:00Z | 104 |
2023-04-12T05:00:00Z | 105 |
2023-04-12T06:00:00Z | 106 |
2023-04-12T07:00:00Z | 107 |
2023-04-12T08:00:00Z | 108 |
2023-04-12T09:00:00Z | 109 |
2023-04-12T10:00:00Z | 110 |
2023-04-12T11:00:00Z | 111 |
2023-04-12T12:00:00Z | 112 |
2023-04-12T13:00:00Z | 113 |
2023-04-12T14:00:00Z | 114 |
2023-04-12T15:00:00Z | 115 |
2023-04-12T16:00:00Z | 116 |
2023-04-12T17:00:00Z | 117 |
2023-04-12T18:00:00Z | 118 |
2023-04-12T19:00:00Z | 119 |
2023-04-12T20:00:00Z | 120 |
2023-04-12T21:00:00Z | 121 |
2023-04-12T22:00:00Z | 122 |
2023-04-12T23:00:00Z | 123 |
2023-04-13T00:00:00Z | 124 |
2023-04-13T01:00:00Z | 125 |
2023-04-13T02:00:00Z | 126 |
2023-04-13T03:00:00Z | 127 |
2023-04-13T04:00:00Z | 128 |
2023-04-13T05:00:00Z | 129 |
2023-04-13T06:00:00Z | 130 |
2023-04-13T07:00:00Z | 131 |
2023-04-13T08:00:00Z | 132 |
2023-04-13T09:00:00Z | 133 |
2023-04-13T10:00:00Z | 134 |
2023-04-13T11:00:00Z | 135 |
2023-04-13T12:00:00Z | 136 |
2023-04-13T13:00:00Z | 137 |
2023-04-13T14:00:00Z | 138 |
2023-04-13T15:00:00Z | 139 |
2023-04-13T16:00:00Z | 140 |
2023-04-13T17:00:00Z | 141 |
2023-04-13T18:00:00Z | 142 |
2023-04-13T19:00:00Z | 143 |
2023-04-13T20:00:00Z | 144 |
2023-04-13T21:00:00Z | 145 |
2023-04-13T22:00:00Z | 146 |
2023-04-13T23:00:00Z | 147 |
Then if you take the spread in each day you will get:
from(bucket: "a")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "el")
|> filter(fn: (r) => r["_field"] == "ACTIVE_IMPORT")
|> aggregateWindow(every: 1d, fn: spread, createEmpty: false)
|> yield(name: "Wh")
day | first | last | spread |
---|---|---|---|
2023-04-13T00:00:00Z | 100 | 123 | 23 |
2023-04-13T00:00:00Z | 124 | 147 | 23 |
The actual consumption for each day, instead is 24. By using spread you are missing one interval for each day (the consumption from 123 to 124 is never accounted). Of course if your data has higher granulatiry (e.g.: every minute or every second) the missing value will be a lot less significant.
To solve this I would suggest just getting a single value for each day (the last) and then using the difference
function. This will then do a "rolling difference" so subtract to each value the previous one and will give you a better result (all hours will be accounted for).
from(bucket: "a")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "el")
|> filter(fn: (r) => r["_field"] == "ACTIVE_IMPORT")
|> aggregateWindow(every: 1d, fn: last, createEmpty: false)
|> difference()
|> yield(name: "Wh")
Aggregating with last
will give:
day | last |
---|---|
2023-04-13T00:00:00Z | 123 |
2023-04-13T00:00:00Z | 147 |
Then applying difference
will result in:
day | difference(last) |
---|---|
2023-04-13T00:00:00Z | |
2023-04-13T00:00:00Z | 24 |
> NOTE: the first value will be null since it does not have anything before it to di the difference
with
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论