2023年4月20日 05:05:13go评论46阅读模式

英文:

Flink Parallelism with event time window

问题

I'm reading from Kafka from a topic with partition = 1, using Windows Event Time without a key. As a result, this code did not close (output) the results of the window. I twisted and figured it out for a long time. As a result, I added env.setParallelism(1); and suddenly everything worked.

我从一个分区为1的Kafka主题中读取数据，使用窗口事件时间而没有使用键。结果，这段代码没有关闭（输出）窗口的结果。我花了很长时间才解决了这个问题。最后，我添加了env.setParallelism(1);，突然一切正常工作。

I want to understand why this parameter is needed in my example? Why didn't the windows close in my example without this parameter?

我想了解为什么在我的示例中需要这个参数？为什么没有这个参数，窗口不会关闭？

Also in the documentation I found that windows without keys always have a concurrency of 1.

1）此外，在文档中，我发现没有键的窗口始终具有并发度为1。

I also want to add that with TumblingProcessingTimeWindows, everything works perfectly regardless of the parameter env.setParallelism(1);

2）我还想补充一点，使用TumblingProcessingTimeWindows，一切都完美运行，无论参数env.setParallelism(1);如何设置。

英文:

I want to understand why this parameter is needed in my example? Why didn't the windows close in my example without this parameter?

Also in the documentation I found that windows without keys always have a concurrency of 1.
I also want to add that with TumblingProcessingTimeWindows, everything works perfectly regardless of the parameter env.setParallelism(1);

   StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
            env.setParallelism(1);
            KafkaSource&lt;UserModel&gt; source = KafkaSource.&lt;UserModel&gt;builder().setBootstrapServers(kafka).setTopics(&quot;f1&quot;).setGroupId(&quot;flink_group&quot;).setStartingOffsets(OffsetsInitializer.earliest())
                    .setValueOnlyDeserializer(new JsonConverter()).build();
    
            DataStream&lt;UserModel&gt; ds = env.fromSource(source, WatermarkStrategy.noWatermarks(), &quot;Kafka Source&quot;);
    
     WatermarkStrategy&lt;UserModel&gt; strategy = WatermarkStrategy.&lt;UserModel&gt;forBoundedOutOfOrderness(Duration.ofSeconds(20))
                    .withTimestampAssigner((i, timestamp) -&gt; {
                        return i.dt.toInstant(ZoneOffset.UTC).toEpochMilli();
                    });
    
            SingleOutputStreamOperator&lt;UserModelEx&gt; reduce = ds.assignTimestampsAndWatermarks(strategy)
                    .windowAll(TumblingEventTimeWindows.of(Time.seconds(10)))
                    .reduce((acc, i) -&gt; {
                        acc.count += i.count;
                        acc.dt = i.dt;
                        System.out.println(acc.dt + &quot; reduce:&quot; + acc.count);
                        return acc;
                    }, new Rich());
    
    
            reduce.print();

答案1

得分: 1

I can't think of a reason why changing the parallelism would cause your window trigger to start working. I'm guessing it's some other issue with your code.

But in looking at your code, this line isn't good:

DataStream<UserModel> ds = env.fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka Source");

Instead of passing in the noWatermarks() strategy you should be providing the strategy you define just after that line.

Also the acc.dt = i.dt line is only valid if you have a single input partition to your reduce, and you know that events in the Kafka topic are ordered by time:

                    .reduce((acc, i) -> {
                        acc.count += i.count;
                        acc.dt = i.dt;
                        System.out.println(acc.dt + " reduce:" + acc.count);
                        return acc;

Otherwise you're setting the accumulator time based on some random element in your window.

But if that's true, then you can and should use the built-in forMonotonousTimestamps watermark strategy (instead of forBoundedOutOfOrderness), which will give you the most accurate watermark with no latency.

英文:

I can't think of a reason why changing the parallelism would cause your window trigger to start working. I'm guessing it's some other issue with your code.

But in looking at your code, this line isn't good:

DataStream&lt;UserModel&gt; ds = env.fromSource(source, WatermarkStrategy.noWatermarks(), &quot;Kafka Source&quot;);

Instead of passing in the noWatermarks() strategy you should be providing the strategy you define just after that line.

Also the acc.dt = i.dt line is only valid if you have a single input partition to your reduce, and you know that events in the Kafka topic are ordered by time:

                    .reduce((acc, i) -&gt; {
                        acc.count += i.count;
                        acc.dt = i.dt;
                        System.out.println(acc.dt + &quot; reduce:&quot; + acc.count);
                        return acc;

Otherwise you're setting the accumulator time based on some random element in your window.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

“Flink并行性与事件时间窗口”

问题

答案1

Apache Flink SQL 流查询的数据持久性

Flink是否像Spark一样具有用于Kafka的minPartitions设置？

如何以编程方式将文本写入 Flink 套接字？

如何将来自Flink的Protobuf字节数组写入Kafka

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论