Dataflow 在 BigQuery 写入完成后发送 PubSub 消息。

huangapple go评论96阅读模式
英文:

Dataflow send PubSub message after BigQuery write completion

问题

我有一个数据流作业,用于转换数据并写入到 BigQuery(批处理作业)。在写入操作完成后,我希望发送一条消息到 PubSub,从而触发对 BigQuery 中数据的进一步处理。我看到过一些关于这可能是可行的旧问题/答案,但仅适用于流式作业:

我想知道是否现在以任何方式支持批量写入作业?不幸的是,我无法使用 Apache Airflow 来编排所有这些步骤,因此似乎将消息发送到 PubSub 是最简单的方法。

英文:

I have a Dataflow job that transforms data and writes out to BigQuery (batch job). Following the completion of the write operation I want to send a message to PubSub which will trigger further processing of the data in BigQuery. I have seen a few older questions/answers that hint at this being possible but only on streaming jobs:

I'm wondering if this is supported in any way for batch write jobs now? I cant use apache airflow to orchestrate all this unfortunately so sending a PubSub message seemed like the easiest way.

答案1

得分: 2

"Beam"的概念意味着无法按照您的意愿执行操作。实际上,您将一个PCollection写入了BigQuery。根据定义,PCollection是有界或无界集合。在处理无界集合后,您如何触发某个操作?何时才知道已经到达了结尾?

因此,您有不同的方法来实现这一点。在您的代码中,您可以等待流水线完成,然后发布一个PubSub消息。

就个人而言,我更喜欢基于日志来实现这一点。当数据流作业完成时,我获取作业结束的日志,并且将其写入PubSub中。这样可以将流水线代码与下一步解耦。

您还可以查看一下Workflow。虽然它目前还不太成熟,但对于像您这样的简单工作流非常有前途。

英文:

The conception of Beam implies the impossibility to do what you want. Indeed, you write a PCollection to BigQuery. By definition, a PCollection is a bounded or unbounded collection. How can you trigger something after a unbounded collection? When do you know that you have reach the end?

So, you have different way to achieve this. In your code, you can wait the pipeline completion and then publish a PubSub message.

Personally, I prefer to base this on the logs; When the the dataflow job is finish, I get the log of the end of job and I sink it into PubSub. That's decorrelated the pipeline code and the next step.

You can also have a look to Workflow. It's not really mature yet, but very promising for simple workflow like yours.

huangapple
  • 本文由 发表于 2020年9月10日 20:51:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/63830098.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定