英文:
Manage Big Query table from Google Sheet
问题
我使用Airflow将大量Google表格中的数据导入BigQuery,这些表格使用写入截断方法。但我遇到了一个问题,这些表格用于每周财务报告,每周初都有一个截止日期。我该如何管理数据,以确保发送给财务部门的数据在未来查看时不会更改,而在Google表格中输入的数据通常会更改?
英文:
I have a lot of tables ETL using airflow from google sheet to big query, these tables use the write truncate method
I have a problem. these tables are used for weekly financial reporting where there is a cut off at the beginning of the week
How can I manage it so that the data sent to finance does not change when viewed in the future, while data that has been inputted on google sheets is often changed
答案1
得分: 2
- 最明显的是:不要使用写入截断,而是使用写入追加方法将新数据添加到现有表中,而不覆盖先前的数据。这需要更改您当前的查询和报告。
- 为过去几周的财务报告创建一个新表,使用写入追加方法每周添加数据到这个表。这可能比前者更容易,因为您可以在追加最近报告的周时引入一个“报告周”标识符。
- 使用调度工具,如Apache Airflow,自动化ETL过程,并从Google Sheets加载数据到BigQuery(即避免手动干预和/或错误)。
- 设置权限和访问控制,以确保只有授权人员可以查看或修改数据,并维护此安全模型以确保数据完整性。
英文:
Some things to consider:
- The most obvious: Instead of using the write truncate, use the write append method to add new data to the existing table without overwriting the previous data. This requires changes to your current queries & reports.
- Create a new table for past weeks financial reporting, use the write append method to add data to this table each week. This may be easier than the former as you could introduce a "reported week" identifier as you append the most recently reported week.
- Use a scheduling tool like Apache Airflow to automate the ETL process and load data from Google Sheets to BigQuery (i.e. to avoid manual interventions and/or mistakes)
- Set up permissions and access controls to ensure that only authorized personnel can view or modify the data - and maintain this security model to ensure data integrity
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论