Flink的countWindow详细工作原理如何

huangapple go评论77阅读模式
英文:

How does Flink countWindow work in detail

问题

我正在使用简单的示例学习Flink。

我已经根据这里调整了WindowWordCount示例,并在这个简单的数据文件上运行它:

cat data.txt
a a b c c

我忽略了slideSize,只使用了countWindow(windowSize)(因此根据这篇文章,它被称为滚动窗口)用于多个窗口大小。我对结果感到困惑。

windowSize = 1时,输出为:

(a,1)
(a,1)
(b,1)
(c,1)
(c,1)

这完全合理。但当windowSize = 2时,输出为:

(a,2)
(c,2)

为什么没有b的计数?

windowSize = 3时,输出为空,这也让我不太理解。

请问是否有人可以帮我理解在windowSize = 2windowSize = 3的情况下输出是如何产生的?

我的代码在https://gist.github.com/zyxue/bc566180d2b01f1e2e77c1bbe3a7c5e5

我使用类似以下命令运行它:

./bin/flink run /path/to/target/myflinkapp-1.0-SNAPSHOT.jar --input data.txt  --output /tmp/out --window 1
英文:

I'm learning Flink with simple toy examples.

I have adapted the WindowWordCount example from here and run it on this simple data file

cat data.txt
a a b c c

I ignored the slideSize and only do countWindow(windowSize) (hence it's called tumbling window according to this post for multiple window sizes. I'm confused by the results.

When windowSize = 1, the output is

(a,1)
(a,1)
(b,1)
(c,1)
(c,1)

which makes total sense. But when windowSize = 2, the output is

(a,2)
(c,2)

where is count for b?

when windowSize = 3, the output is empty, which I don't understand either.

Can anyone help me understand how the outputs are produced in the cases of windowSize = 2 and windowSize = 3, please?

My code is on https://gist.github.com/zyxue/bc566180d2b01f1e2e77c1bbe3a7c5e5

And I run it with command like

./bin/flink run /path/to/target/myflinkapp-1.0-SNAPSHOT.jar --input data.txt  --output /tmp/out --window 1

答案1

得分: 2

Trigger for CountWindow仅对完整的窗口触发窗口函数 — 换句话说,经过处理给定键的windowSize个事件后,窗口才会触发。

例如,当windowSize = 2,只有ac有足够的事件。由于只有一个b,作业会以b的部分填充窗口结束。

如果您想为部分计数窗口生成报告,可以使用自定义触发器,该触发器还会对超时做出反应。

英文:

The Trigger for CountWindow only triggers the window function for complete windows -- in other words, after processing windowSize events for a given key, then the window will fire.

For example, with windowSize = 2, only for a and c are there enough events. Since there is only one b, the job ends with a partially filled window for b.

You can use a custom trigger that also reacts to a timeout if you want to generate reports for partial count windows.

huangapple
  • 本文由 发表于 2020年9月15日 13:04:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/63895423.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定