英文:
Is there setDelayThreshold() command for PubSubIO write in Apache Beam?
问题
是否有在Apache Beam中设置批量PubSubIO写请求的maxBatchSize(),并为其设置时间限制的选项?文档中写道:"> 例如,如果设置为1000,写入接收器将等待直到接收到1000条消息,或者管道完成,以先到者为准。"
因此,如果既没有填满接收器,也没有完成管道,这可能意味着消息将永远存储在那里?对于这种情况,有哪些可能的解决方法?
我希望能够为发送批量消息设置超时。
英文:
Is there an option of not only setting the maxBatchSize() for the bulk PubSubIO write request in Apache Beam, but to also give a time limit for it? In the documentation it is written
> For example, if given 1000 the write sink will wait until 1000 messages have been
> received, or the pipeline has finished, whichever is first.
Hence, if neither the sink is full nor pipeline has finished, it might mean that the messages will be stored there forever? What are the possible workarounds for this situation?
I expect to be able to set a timeout for sending bulk messages.
答案1
得分: 0
这是一个流式处理管道吗?如果是的话,默认情况下,它使用PubsubUnboundedSink,并具有默认的最大延迟时间为2秒。在这种情况下,2秒基本上是您描述的“时间限制”。
如果这是一个批处理,工作在捆绑包内进行处理(参见链接),写操作发生在捆绑包完成时。
因此,您所描述的消息永久存储的情况不应该出现。如果您注意到这种行为,请提供更多信息更新。
英文:
Is it a Streaming pipeline? If so, by default, it uses PubsubUnboundedSink and has a default maximum latency of 2 seconds. In this case, 2 seconds is basically the "time limit" that you are describing.
If it is a batch, work is processed in bundles (see link) and the writes happen upon bundle finalization.
So what you are describing of messages being stored forever should not be observed. Please update with more information if you are noticing such behavior.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论