Firestore -> BigQuery 数据镜像

huangapple go评论57阅读模式
英文:

Firestore -> BigQuery mirroring

问题

我目前正在进行一个项目,需要将Firestore中的数据镜像到BigQuery,以便进行进一步的分析和报告。然而,我希望排除某些字段不被镜像到BigQuery。

我已经探索了Firestore的Stream Mirror扩展,它简化了实时镜像过程。但是,在镜像过程中它并没有提供直接控制排除特定字段的选项。

我想寻求关于是否有办法实现这个功能的指导。我的目标是自定义镜像过程,以在将Firestore文档存储到BigQuery之前排除特定字段。

我想知道是否有推荐的方法或技术来实现这一点。是否有可用的工具、库或方法可以帮助我在镜像过程中实现选择性字段排除?

感谢社区能提供的任何见解或建议。提前感谢您的帮助!

英文:

I am currently working on a project where I need to mirror data from Firestore to BigQuery for further analysis and reporting purposes. However, I want to exclude certain fields from being mirrored into BigQuery.

I have explored the Stream Mirror extension for Firestore, which simplifies the mirroring process in real-time. However, it doesn't provide direct control over excluding specific fields during the mirroring process.

I am seeking guidance on whether there is a way to achieve this functionality. My goal is to customize the mirroring process to exclude specific fields from Firestore documents before they are stored in BigQuery.

I would like to know if there are any recommended approaches or techniques to accomplish this. Are there any available tools, libraries, or methods that can help me achieve selective field exclusion during the mirroring process?

I appreciate any insights or suggestions that the community can provide. Thank you in advance for your assistance!

答案1

得分: 1

根据您提供的内容,以下是翻译好的部分:

根据扩展文档中的解释,该扩展允许使用Transform Cloud Function将Firestore数据转换为要写入BigQuery的格式。

Transform函数应该是一个HTTP Cloud Function,具有以下逻辑:从请求中获取输入对象,进行转换,然后在响应中发送回来;如下所示的CF骨架:

exports.bqTransform = functions.https.onRequest(async (req, res) => {
    
   const inputPayload = req.body // JS对象
   // ...
   // 转换对象
   // ...
   const outputPayload = {...}   // JS对象
    
   res.send(outputPayload);
});

如文档所述,inputPayload对象(即req.body)包含一个数据属性(它是一个数组),其中包含Firestore文档的表示,如下所示:

{
data: [{
insertId: int;
json: {
timestamp: int;
event_id: int;
document_name: string;
document_id: int;
operation: ChangeType;
data: string; // <= 包含表示Firestore文档数据的字符串化对象的字符串
},
}]
}

在您的Cloud Function代码中实施的转换应创建具有相同结构的对象(如上面我们示例的outputPayload),其中data[0].json属性根据您的转换需求进行了调整。


以下是一个非常简单的示例,我们仅使用Firestore文档的foo字段以及一些静态数据完全更改Firestore记录的内容。

exports.bqTransform = functions.https.onRequest(async (req, res) => {

    const inputPayload = req.body; 
    const inputData = inputPayload.data[0];

    const outputPayload = [{
        insertId: inputData.insertId,
        json: {
            timestamp: inputData.json.timestamp,
            event_id: inputData.json.event_id,
            document_name: inputData.json.document_name,
            document_id: inputData.json.document_id,
            operation: inputData.json.operation,
            data: JSON.stringify({ foo: inputData.json.data.foo, array: ["a1", "a2"], name: "Transformed Name" })
        },
    }]   

    res.send({ data: outputPayload });
});
英文:

As explained in the Extension's documentation the extension allows the use a of Transform Cloud Function for converting the Firestore data to be written to BigQuery.

The transform Function should be an HTTP Cloud Function with the following logic: get the input object from the request, transform it, send it back in the response; as shown in the below CF skeleton:

exports.bqTransform = functions.https.onRequest(async (req, res) =&gt; {
    
   const inputPayload = req.body // JS Object
   // ...
   // Transform the object 
   // ...
   const outputPayload = {...}   // JS Object
    
   res.send(outputPayload);
    });

As explained in the doc, the inputPayload object (i.e. req.body) contains a data property (which is an array) which contains a representation of the Firestore document, has shown below:

{ 
  data: [{
    insertId: int;
    json: {
      timestamp: int;
      event_id: int;
      document_name: string;
      document_id: int;
      operation: ChangeType;
      data: string;  // &lt;= String containing the stringified object representing the Firestore document data
    },
  }]
}

The transformation implemented in your Cloud Function code shall create an object with the same structure (outputPayload in our skeleton example above) where the data[0].json property is adapted according to your transformation requirements.


Here is a very simple example in which we totally change the content of the Firestore record with just the foo field of the Firestore document plus some static data.

exports.bqTransform = functions.https.onRequest(async (req, res) =&gt; {

    const inputPayload = req.body; 
    const inputData = inputPayload.data[0];

    const outputPayload = [{
        insertId: inputData.insertId,
        json: {
            timestamp: inputData.json.timestamp,
            event_id: inputData.json.event_id,
            document_name: inputData.json.document_name,
            document_id: inputData.json.document_id,
            operation: inputData.json.operation,
            data: JSON.stringify({ foo: inputData.json.data.foo, array: [&quot;a1&quot;, &quot;a2&quot;], name: &quot;Transformed Name&quot; })
        },
    }]   

    res.send({ data: outputPayload });
});

huangapple
  • 本文由 发表于 2023年6月26日 14:55:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76554195.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定