@SupportsBatching在NiFi的处理器类上具体做什么?

huangapple go评论59阅读模式
英文:

What does @SupportsBatching exactly do on top of NiFi's processor class?

问题

我查看了 NiFi 的开发者指南用户指南NiFi 深度,但未能找到有关 @SupportsBatching 注解的任何信息。

如果我查看该注解/接口的源代码,我读到了这个JavaDoc:

标记注解,处理器实现可以使用它来指示用户应该能够为处理器提供批处理持续时间。如果处理器使用此注解,它允许框架批处理 ProcessSessions 的提交,以及允许框架从后续对 ProcessSessionFactory.createSession() 的调用中多次返回相同的 ProcessSession。使用此注解时,重要的是要注意,对 ProcessSession.commit() 的调用可能不能保证数据已安全存储在 NiFi 的内容存储库或流文件存储库中。因此,如果处理器将调用 ProcessSession.commit() 以确保在从远程源删除数据之前将数据持久化,使用此注解是不合适的,例如,当设置 defaultDuration 参数时,处理器将使用提供的持续时间创建,可以在之后进行调整。提供的值可以从 DefaultRunDuration 中选择。

但这对我来说显然不够清晰。

我的问题很简单:@SupportsBatching 到底是做什么的,请用尽可能简单的话解释一下?

在过去的一个月里,我对 NiFi 越来越感兴趣。不过,我必须说,这项技术的文档、社区和支持都相当不足。

英文:

I have looked into NiFi's Developer Guide, User Guide and NiFi In-Depth, but I was unable to find anything about @SupportsBatching annotation.

If I go to source of that annotation/interface, I read this JavaDoc:
>Marker annotation a Processor implementation can use to indicate that users should be able to supply a Batch Duration for the Processor. If a Processor uses this annotation, it is allowing the Framework to batch ProcessSessions' commits, as well as allowing the Framework to return the same ProcessSession multiple times from subsequent calls to ProcessSessionFactory. createSession(). When this Annotation is used, it is important to note that calls to ProcessSession.commit() may not provide a guarantee that the data has been safely stored in NiFi's Content Repository or FlowFile Repository. Therefore, it is not appropriate, for instance, to use this annotation if the Processor will call ProcessSession.commit() to ensure data is persisted before deleting the data from a remote source. When the defaultDuration parameter is set, the processor is created with the supplied duration time, which can be adjusted afterwards. The supplied values can be selected from DefaultRunDuration.

but this is definitely not clear to me.

My question is simple: What does @SupportsBatching actually do, in a simplest possible words, please?

In the course of the last month, I'm getting more and more into NiFi. I have to say, that documentation, community and support is quite lacking for this technology.

答案1

得分: 1

处理器的“调度”选项卡会在 UI 中显示一个“运行持续时间”滑块,如果该组件带有此注释。

运行持续时间类似于批处理时间,也就是说,如果您将其设置为 5 秒,它允许 NiFi 框架在最多 5 秒的时间内执行处理器,并将此期间的所有执行视为单个批次,从内部存储库的更新效率方面更高。

这允许您在延迟和吞吐量之间进行权衡...

运行持续时间为 0 基本上意味着没有批处理,因此每个流文件都会尽可能快地独立通过系统,具有低延迟。

更长的运行持续时间意味着在批次完成之前可能不会从处理器传输单个流文件,这会使流文件需要更长的时间才能到达流的末端,但可能会在相同时间段内处理更多的流文件(吞吐量更高)。

更多信息...
https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#run-duration

英文:

The simplest answer is that the Scheduling Tab of the processor will show a Run Duration slider in the UI when the component is marked with this annotation.

Run Duration is like a batch time, meaning if you set it to 5 seconds it allows the NiFi framework to execute the processor for up to 5 seconds, and treat all of the executions during that time as a single batch which is more efficient in terms of updating internal repositories.

This allows you to make a trade-off between latency and throughput...

A run duration of 0 would basically mean no batching, so each flow file will move through the system independently as fast as possible, so low latency.

A higher run duration means an individual flow file may not be transferred from the processor until a batch is complete, which makes the flow file take longer to reach the end of the flow, but may lead to more overall flow files being processed in the same time period (higher throughput).

Additional information...
https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#run-duration

huangapple
  • 本文由 发表于 2023年3月7日 02:33:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/75654570.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定