Apache Beam 中的 PubSubIO 写操作是否有 setDelayThreshold() 命令?

huangapple go评论56阅读模式
英文:

Is there setDelayThreshold() command for PubSubIO write in Apache Beam?

问题

是否有在Apache Beam中设置批量PubSubIO写请求的maxBatchSize(),并为其设置时间限制的选项?文档中写道:"> 例如,如果设置为1000,写入接收器将等待直到接收到1000条消息,或者管道完成,以先到者为准。"

因此,如果既没有填满接收器,也没有完成管道,这可能意味着消息将永远存储在那里?对于这种情况,有哪些可能的解决方法?

我希望能够为发送批量消息设置超时。

英文:

Is there an option of not only setting the maxBatchSize() for the bulk PubSubIO write request in Apache Beam, but to also give a time limit for it? In the documentation it is written
> For example, if given 1000 the write sink will wait until 1000 messages have been
> received, or the pipeline has finished, whichever is first.

Hence, if neither the sink is full nor pipeline has finished, it might mean that the messages will be stored there forever? What are the possible workarounds for this situation?

I expect to be able to set a timeout for sending bulk messages.

答案1

得分: 0

这是一个流式处理管道吗?如果是的话,默认情况下,它使用PubsubUnboundedSink,并具有默认的最大延迟时间为2秒。在这种情况下,2秒基本上是您描述的“时间限制”。

如果这是一个批处理,工作在捆绑包内进行处理(参见链接),写操作发生在捆绑包完成时。

因此,您所描述的消息永久存储的情况不应该出现。如果您注意到这种行为,请提供更多信息更新。

英文:

Is it a Streaming pipeline? If so, by default, it uses PubsubUnboundedSink and has a default maximum latency of 2 seconds. In this case, 2 seconds is basically the "time limit" that you are describing.

If it is a batch, work is processed in bundles (see link) and the writes happen upon bundle finalization.

So what you are describing of messages being stored forever should not be observed. Please update with more information if you are noticing such behavior.

huangapple
  • 本文由 发表于 2023年2月27日 18:26:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75579272.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定