英文:
KStream-KStream-Join with different Results on consecutive executions
问题
因为我不知道我的问题是否会被重新开启... 这里有一个更精确的问题。
我有StreamA(包含在30分钟间隔内生产的产品)和StreamB(包含来自4个不同传感器的测量数据,每个传感器每5分钟产生一次测量数据)。这两个流根据一个共同的键进行连接。StreamC是这个连接的结果,包含了测量增强的产品。
我有大约15,000个产品和约250,000条测量数据。以下是结果:
<pre>
运行 StreamC中的记录数
1 149,389
2 149,362
3 149,363
4 149,411
</pre>
每次运行的配置都完全相同,流A/B中的事件也相同。
我真的不知道为什么会出现这种情况。有可能底层状态存储出现了问题吗?
英文:
Because I don't know whether my question will be reopened.. here a more precise question.
I have StreamA (containing a product which is produced within a 30 minutes interval) and StreamB (containing measurements from 4 different sensors, producing a measurement every 5 minutes each). These two streams are joined on a common key. StreamC is the result of this join and contains measurementEnrichedProducts.
I have ~15k products and ~250k measurements. Below are the results:
<pre>
Run Num records within StreamC
1 149,389
2 149,362
3 149,363
4 149,411
</pre>
Each run had the exact same config and the events in streamA/B were the same too.
I really do not know why this is the case. Is it possible that there are any problems with the underlying statestores?
答案1
得分: 0
我重新启动应用程序的速度太快了...
在调整max.task.idle.ms属性时,我注意到结果是稳定的(每次执行时数量相同),但比以前少。在让应用程序运行了超过15分钟(max.task.idle.ms=600000[10分钟])后,我收到了更多的结果,并且streamC中的记录数量也保持稳定。
再次移除max.task.idle.ms并且等待足够长的时间会导致相同的结果。
我怀疑问题是由于无序的输入数据和内部缓冲区未被填充导致的。
英文:
I was restarting the application too fast...
When playing with the max.task.idle.ms-property I noticed that the results were stable (same amount every execution) but less than before. After letting the application run for more than 15 minutes (max.task.idle.ms=600000[10minutes]) I received some more results and the number of records in streamC were stable too.
Removing max.task.idle.ms again and waiting long enough lead to the same results.
I suspect the problem occurred due to the out-of-order input data and internal buffers not being filled.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论