Firestore -> BigQuery 数据镜像

huangapple go评论57阅读模式

Firestore -> BigQuery mirroring



我已经探索了Firestore的Stream Mirror扩展,它简化了实时镜像过程。但是,在镜像过程中它并没有提供直接控制排除特定字段的选项。





I am currently working on a project where I need to mirror data from Firestore to BigQuery for further analysis and reporting purposes. However, I want to exclude certain fields from being mirrored into BigQuery.

I have explored the Stream Mirror extension for Firestore, which simplifies the mirroring process in real-time. However, it doesn't provide direct control over excluding specific fields during the mirroring process.

I am seeking guidance on whether there is a way to achieve this functionality. My goal is to customize the mirroring process to exclude specific fields from Firestore documents before they are stored in BigQuery.

I would like to know if there are any recommended approaches or techniques to accomplish this. Are there any available tools, libraries, or methods that can help me achieve selective field exclusion during the mirroring process?

I appreciate any insights or suggestions that the community can provide. Thank you in advance for your assistance!


得分: 1


根据扩展文档中的解释,该扩展允许使用Transform Cloud Function将Firestore数据转换为要写入BigQuery的格式。

Transform函数应该是一个HTTP Cloud Function,具有以下逻辑:从请求中获取输入对象,进行转换,然后在响应中发送回来;如下所示的CF骨架:

exports.bqTransform = functions.https.onRequest(async (req, res) => {
   const inputPayload = req.body // JS对象
   // ...
   // 转换对象
   // ...
   const outputPayload = {...}   // JS对象


data: [{
insertId: int;
json: {
timestamp: int;
event_id: int;
document_name: string;
document_id: int;
operation: ChangeType;
data: string; // <= 包含表示Firestore文档数据的字符串化对象的字符串

在您的Cloud Function代码中实施的转换应创建具有相同结构的对象(如上面我们示例的outputPayload),其中data[0].json属性根据您的转换需求进行了调整。


exports.bqTransform = functions.https.onRequest(async (req, res) => {

    const inputPayload = req.body; 
    const inputData =[0];

    const outputPayload = [{
        insertId: inputData.insertId,
        json: {
            timestamp: inputData.json.timestamp,
            event_id: inputData.json.event_id,
            document_name: inputData.json.document_name,
            document_id: inputData.json.document_id,
            operation: inputData.json.operation,
            data: JSON.stringify({ foo:, array: ["a1", "a2"], name: "Transformed Name" })

    res.send({ data: outputPayload });

As explained in the Extension's documentation the extension allows the use a of Transform Cloud Function for converting the Firestore data to be written to BigQuery.

The transform Function should be an HTTP Cloud Function with the following logic: get the input object from the request, transform it, send it back in the response; as shown in the below CF skeleton:

exports.bqTransform = functions.https.onRequest(async (req, res) =&gt; {
   const inputPayload = req.body // JS Object
   // ...
   // Transform the object 
   // ...
   const outputPayload = {...}   // JS Object

As explained in the doc, the inputPayload object (i.e. req.body) contains a data property (which is an array) which contains a representation of the Firestore document, has shown below:

  data: [{
    insertId: int;
    json: {
      timestamp: int;
      event_id: int;
      document_name: string;
      document_id: int;
      operation: ChangeType;
      data: string;  // &lt;= String containing the stringified object representing the Firestore document data

The transformation implemented in your Cloud Function code shall create an object with the same structure (outputPayload in our skeleton example above) where the data[0].json property is adapted according to your transformation requirements.

Here is a very simple example in which we totally change the content of the Firestore record with just the foo field of the Firestore document plus some static data.

exports.bqTransform = functions.https.onRequest(async (req, res) =&gt; {

    const inputPayload = req.body; 
    const inputData =[0];

    const outputPayload = [{
        insertId: inputData.insertId,
        json: {
            timestamp: inputData.json.timestamp,
            event_id: inputData.json.event_id,
            document_name: inputData.json.document_name,
            document_id: inputData.json.document_id,
            operation: inputData.json.operation,
            data: JSON.stringify({ foo:, array: [&quot;a1&quot;, &quot;a2&quot;], name: &quot;Transformed Name&quot; })

    res.send({ data: outputPayload });

  • 本文由 发表于 2023年6月26日 14:55:49
  • 转载请务必保留本文链接:



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
