为了正确运行 Beam IO,为什么需要使用 beam.AddFixedKey 和 beam.GroupByKey?

huangapple go评论92阅读模式
英文:

Why do a Beam io need beam.AddFixedKey+beam.GroupByKey to work properly?

问题

我正在为Golang中的Elasticsearch编写一个Beam IO,目前我有一个工作草稿版本,但是只有在执行某些操作后才能使其正常工作,而我不清楚为什么需要这样做。

基本上,我查看了现有的IO,并发现只有在添加以下内容时写入才能正常工作:

x := beam.AddFixedKey(s, pColl)
y := beam.GroupByKey(s, x)

现有的BigQuery IO中有一个完整的示例。

我想了解为什么需要同时使用AddFixedKey和GroupByKey才能使其正常工作。我还查看了BEAM-3860问题,但是对此没有更多详细信息。

英文:

I'm working on a Beam IO for Elasticsearch in Golang and at the moment I have a working draft version but, only managed to make it work by doing something that's not clear to me why do I need it.
Basically I looked at existing IO's and found that writes only work if I add the following:

x := beam.AddFixedKey(s, pColl)
y := beam.GroupByKey(s, x)

A full example is in the existing BigQuery IO

Basically I would like to understand why do I need both AddFixedKey followed by a GroupByKey to make it work. Also checked the issue BEAM-3860, but doesn't have much more details about it.

答案1

得分: 1

这两个转换实际上是将PCollection中的所有元素分组为一个列表的方法。例如,在你发布的BigQuery示例中的用法允许将整个输入PCollection分组为一个列表,在ProcessElement方法中进行迭代。

是否使用这种方法取决于你如何实现IO。你发布的BigQuery示例在所有元素可用时批量执行写入操作,但这可能不是你的用例的最佳方法。如果你可以在不同的工作节点之间并行写入,你可能更喜欢按照元素的到达时间逐个写入元素。在这种情况下,你应该避免将输入PCollection分组在一起。

英文:

Those two transforms essentially function as a way to group all elements in a PCollection into one list. For example, its usage in the BigQuery example you posted allows grouping the entire input PCollection into a list that gets iterated over in the ProcessElement method.

Whether to use this approach depends how you are implementing the IO. The BigQuery example you posted performs its writes as a batch once all elements are available, but that may not be the best approach for your use case. You might prefer to write elements one at a time as they come in, especially if you can parallelize writes among different workers. In that case you would want to avoid grouping the input PCollection together.

huangapple
  • 本文由 发表于 2021年6月2日 06:43:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/67796876.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定