使用Flux中的变量在filter函数中进行动态过滤

huangapple go评论49阅读模式
英文:

Dynamic Filtering using variable in filter function in Flux

问题

使用分位数函数,我能够在数据流中获取95%的分位数值。

现在,我想筛选那些低于95%分位数的记录。因此,我循环遍历我的记录并筛选那些低于分位数的记录。然而,在这个话题上我遇到了错误 –

请查看下面的代码 –

    percentile = totalTimeByDoc
      |> filter(fn: (r) => r["documentType"] == "PurchaseOrder")
      
      |> group(columns:["documentType"])
    //   |> yield()
      |> quantile(column: "processTime", q: 0.95, method: "estimate_tdigest", compression: 9999.0)
      |> limit(n: 1)
      |> rename(columns: {processTime: "pt"})

给我这些数据 –

    0 PurchaseOrder 999

现在,我尝试循环遍历我的记录并筛选 -

     percentile_filered = totalTimeByDoc
      |> filter(fn: (r) => r["documentType"] == "PurchaseOrder")
       |> filter(fn: (r) => r.processTime < percentile[0]["pt"])
         |> yield()

其中,totalTimeByDoc是这样的 –

    |0|PurchaseOrder|testpass22PID230207222747-1|1200|
    
    |1|PurchaseOrder|testpass22PID230207222747-2|807|
    |2|PurchaseOrder|testpass22PID230207222934-1|671|
    |3|PurchaseOrder|testpass22PID230207222934-2|670|

我从上面的查询中得到以下错误 –

     error @116:41-116:51: expected [{A with pt: B}] (array) but found stream[{A with pt: B}]
英文:

Using the quantile function, I was able to get 95 % percentile value in a stream.

Now, i want to filter records which lie below the 95% percentile.
hence, I loop over my recods and filter records which lie below the percentile.
However, at this topic I get error –

Please find code below –

percentile = totalTimeByDoc
  |&gt; filter(fn: (r) =&gt; r[&quot;documentType&quot;] == &quot;PurchaseOrder&quot;)
  
  |&gt; group(columns:[&quot;documentType&quot;])
//   |&gt; yield()
  |&gt; quantile(column: &quot;processTime&quot;, q: 0.95, method: &quot;estimate_tdigest&quot;, compression: 9999.0)
  |&gt; limit(n: 1)
  |&gt; rename(columns: {processTime: &quot;pt&quot;})

Gives me data – >

0 PurchaseOrder 999

Now, I try to loop over my records and filter -

 percentile_filered = totalTimeByDoc
  |&gt; filter(fn: (r) =&gt; r[&quot;documentType&quot;] == &quot;PurchaseOrder&quot;)
   |&gt; filter(fn: (r) =&gt; r.processTime &lt; percentile[0][&quot;pt&quot;])
     |&gt; yield()

Where, totalTimeByDoc is like below –

|0|PurchaseOrder|testpass22PID230207222747-1|1200|

|1|PurchaseOrder|testpass22PID230207222747-2|807|
|2|PurchaseOrder|testpass22PID230207222934-1|671|
|3|PurchaseOrder|testpass22PID230207222934-2|670|

I get following error from above query –

 error @116:41-116:51: expected [{A with pt: B}] (array) but found stream[{A with pt: B}]

答案1

得分: 0

你只需要从 percentile 流中提取列。请参考提取标量值。在这种情况下,你可以这样做:

percentile = totalTimeByDoc
  |> ...
  |> rename(columns: {processTime: "pt"})
  |> findColumn(fn: (key) => true, column: "pt")

percentile_filtered = totalTimeByDoc
  |> filter(fn: (r) => r["documentType"] == "PurchaseOrder")
  |> filter(fn: (r) => r.processTime < percentile[0])
  |> yield()
英文:

You are only missing column extraction from percentile stream. Have a look at Extract scalar values. In this very case, you could do

percentile = totalTimeByDoc
  |&gt; ...
  |&gt; rename(columns: {processTime: &quot;pt&quot;})
  |&gt; findColumn(fn: (key) =&gt; true, column: &quot;pt&quot;)

percentile_filtered = totalTimeByDoc
  |&gt; filter(fn: (r) =&gt; r[&quot;documentType&quot;] == &quot;PurchaseOrder&quot;)
  |&gt; filter(fn: (r) =&gt; r.processTime &lt; percentile[0])
  |&gt; yield()

huangapple
  • 本文由 发表于 2023年2月8日 13:44:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75381773.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定