Flux聚合函数,用于计算从最后到第一个的差异。

huangapple go评论97阅读模式
英文:

Flux aggregate function to calculate the difference from last to first

问题

我正在尝试学习InfluxDB的Flux查询语言。我正在使用InfluxDB OSS 2.7。

我有一个时间序列,记录了我的电表的用电量。它报告的是一个不断增加的数字,单位是千瓦时(KWh),我想显示每天我使用了多少瓦时(Wh),使用了一个带有aggregateWindow的自定义函数。以下是我尝试过的内容:

myFunc = (tables=<-, column) => {
  a = tables
    |> first(column: column)
    |> findRecord(fn: (key) => true, idx: 0)

  b = tables
    |> last(column: column)
    |> findRecord(fn: (key) => true, idx: 0)

  d = b._value - a._value

  return tables
    |> first()
    |> map(fn: (r) => ({ r with _value: d}))
}

from(bucket: "a")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "el")
  |> filter(fn: (r) => r["_field"] == "ACTIVE_IMPORT")
  |> aggregateWindow(every: 1d, fn: myFunc, createEmpty: false)
  |> yield(name: "Wh")

但是这返回了一个新的表,其中所有的_value都是相同的数字(在我的情况下是134)。

我原本希望变量ab会有每个_window_的第一个和最后一个值,并且d会表示每个窗口中的用电量 - 但是事实似乎并非如此。

英文:

I'm trying to learn the flux query language for InfluxDB. I'm using InfluxDB OSS 2.7.

I have a time-series with power usage from my power meter. It reports an ever increasing number in KWh, and I want to show how many Wh I have used per day, by using a custom function with aggregateWindow. Here is what I have tried:

myFunc = (tables=&lt;-, column) =&gt; {
  a = tables
    |&gt; first(column: column)
    |&gt; findRecord(fn: (key) =&gt; true, idx: 0)

  b = tables
    |&gt; last(column: column)
    |&gt; findRecord(fn: (key) =&gt; true, idx: 0)

  d = b._value - a._value

  return tables
    |&gt; first()
    |&gt; map(fn: (r) =&gt; ({ r with _value: d}))
}

from(bucket: &quot;a&quot;)
  |&gt; range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |&gt; filter(fn: (r) =&gt; r[&quot;_measurement&quot;] == &quot;el&quot;)
  |&gt; filter(fn: (r) =&gt; r[&quot;_field&quot;] == &quot;ACTIVE_IMPORT&quot;)
  |&gt; aggregateWindow(every: 1d, fn: myFunc, createEmpty: false)
  |&gt; yield(name: &quot;Wh&quot;) 

But this returns a new table, where all _value have the same number (in my case 134).

I was hoping that the variables a and b would have the first and the last value of each window, and that d would represent the usage in each window - but this does not seem to be the case.

答案1

得分: 2

如果您想要为每个窗口查找第一个和最后一个值之间的差异,可以使用spread函数

要确切,spread计算的是最小值和最大值之间的差异(而不是第一个和最后一个),但在一个始终递增的系列中,这两者是相同的。

然而,这并不精确。如果考虑以下数据:

时间戳
2023-04-12T00:00:00Z 100
2023-04-12T01:00:00Z 101
2023-04-12T02:00:00Z 102
2023-04-12T03:00:00Z 103
2023-04-12T04:00:00Z 104
2023-04-12T05:00:00Z 105
...(以下省略)

然后,如果您在每一天内获取差异,将会得到:

from(bucket: "a")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "el")
  |> filter(fn: (r) => r["_field"] == "ACTIVE_IMPORT")
  |> aggregateWindow(every: 1d, fn: spread, createEmpty: false)
  |> yield(name: "Wh")
日期 第一个 最后一个 差异
2023-04-13T00:00:00Z 100 123 23
2023-04-13T00:00:00Z 124 147 23

然而,每天的实际消耗量是24。使用spread会错过每天的一个间隔(从123到124的消耗量未被计算)。当然,如果您的数据粒度更高(例如:每分钟或每秒),那么缺失的值将不那么重要。

为了解决这个问题,我建议只获取每天的单个值(最后一个),然后使用difference函数。这将执行“滚动差异”,因此从每个值中减去前一个值,并将为您提供更好的结果(所有小时都将得到计算)。

from(bucket: "a")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "el")
  |> filter(fn: (r) => r["_field"] == "ACTIVE_IMPORT")
  |> aggregateWindow(every: 1d, fn: last, createEmpty: false)
  |> difference()
  |> yield(name: "Wh")

使用last进行聚合将得到:

日期 最后一个
2023-04-13T00:00:00Z 123
2023-04-13T00:00:00Z 147

然后应用difference将得到:

日期 差异(last)
2023-04-13T00:00:00Z
2023-04-13T00:00:00Z 24

注意:第一个值将为null,因为它没有任何内容用于执行difference

英文:

If you want to find for each window the difference between the first and last values you can use the spread function.
To be precise spread calculates the difference between the minimum and maximum values (not first and last) but in an always-increasing series the two are the same.

This though is not precise. If you consider the following data:

timestamp value
2023-04-12T00:00:00Z 100
2023-04-12T01:00:00Z 101
2023-04-12T02:00:00Z 102
2023-04-12T03:00:00Z 103
2023-04-12T04:00:00Z 104
2023-04-12T05:00:00Z 105
2023-04-12T06:00:00Z 106
2023-04-12T07:00:00Z 107
2023-04-12T08:00:00Z 108
2023-04-12T09:00:00Z 109
2023-04-12T10:00:00Z 110
2023-04-12T11:00:00Z 111
2023-04-12T12:00:00Z 112
2023-04-12T13:00:00Z 113
2023-04-12T14:00:00Z 114
2023-04-12T15:00:00Z 115
2023-04-12T16:00:00Z 116
2023-04-12T17:00:00Z 117
2023-04-12T18:00:00Z 118
2023-04-12T19:00:00Z 119
2023-04-12T20:00:00Z 120
2023-04-12T21:00:00Z 121
2023-04-12T22:00:00Z 122
2023-04-12T23:00:00Z 123
2023-04-13T00:00:00Z 124
2023-04-13T01:00:00Z 125
2023-04-13T02:00:00Z 126
2023-04-13T03:00:00Z 127
2023-04-13T04:00:00Z 128
2023-04-13T05:00:00Z 129
2023-04-13T06:00:00Z 130
2023-04-13T07:00:00Z 131
2023-04-13T08:00:00Z 132
2023-04-13T09:00:00Z 133
2023-04-13T10:00:00Z 134
2023-04-13T11:00:00Z 135
2023-04-13T12:00:00Z 136
2023-04-13T13:00:00Z 137
2023-04-13T14:00:00Z 138
2023-04-13T15:00:00Z 139
2023-04-13T16:00:00Z 140
2023-04-13T17:00:00Z 141
2023-04-13T18:00:00Z 142
2023-04-13T19:00:00Z 143
2023-04-13T20:00:00Z 144
2023-04-13T21:00:00Z 145
2023-04-13T22:00:00Z 146
2023-04-13T23:00:00Z 147

Then if you take the spread in each day you will get:

from(bucket: &quot;a&quot;)
  |&gt; range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |&gt; filter(fn: (r) =&gt; r[&quot;_measurement&quot;] == &quot;el&quot;)
  |&gt; filter(fn: (r) =&gt; r[&quot;_field&quot;] == &quot;ACTIVE_IMPORT&quot;)
  |&gt; aggregateWindow(every: 1d, fn: spread, createEmpty: false)
  |&gt; yield(name: &quot;Wh&quot;) 
day first last spread
2023-04-13T00:00:00Z 100 123 23
2023-04-13T00:00:00Z 124 147 23

The actual consumption for each day, instead is 24. By using spread you are missing one interval for each day (the consumption from 123 to 124 is never accounted). Of course if your data has higher granulatiry (e.g.: every minute or every second) the missing value will be a lot less significant.

To solve this I would suggest just getting a single value for each day (the last) and then using the difference function. This will then do a "rolling difference" so subtract to each value the previous one and will give you a better result (all hours will be accounted for).

from(bucket: &quot;a&quot;)
  |&gt; range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |&gt; filter(fn: (r) =&gt; r[&quot;_measurement&quot;] == &quot;el&quot;)
  |&gt; filter(fn: (r) =&gt; r[&quot;_field&quot;] == &quot;ACTIVE_IMPORT&quot;)
  |&gt; aggregateWindow(every: 1d, fn: last, createEmpty: false)
  |&gt; difference()
  |&gt; yield(name: &quot;Wh&quot;)

Aggregating with last will give:

day last
2023-04-13T00:00:00Z 123
2023-04-13T00:00:00Z 147

Then applying difference will result in:

day difference(last)
2023-04-13T00:00:00Z
2023-04-13T00:00:00Z 24

> NOTE: the first value will be null since it does not have anything before it to di the difference with

huangapple
  • 本文由 发表于 2023年4月11日 02:22:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75979653.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定