2023年3月7日 22:13:12go评论178阅读模式

英文:

Creating Schema Agnostic Backups in BigQuery / Creating Table from JSON type through query

问题

感谢@NickW的提醒，通过BigQuery的本机快照功能，整个问题变得无关紧要...最好避免重新发明轮子，而是使用框架支持的内容！我对这个解决方案感到满意。

我有以下问题：
我想创建无需模式的备份，也就是说，如果原始数据的模式发生更改，不会破坏我的备份。

我的想法是将其存储为JSON文件，然后查询它，就像这样（在BigQuery SQL中）：

WITH aggregated_table as (
  -- 将所有数据聚合成一个JSON文件
  SELECT id ,TO_JSON(row_struct) AS data
  -- 将每一行提取为STRUCT
  FROM mytable row_struct)

-- 备份使用时间作为主键
SELECT CURRENT_TIMESTAMP() as backup_time, id, data from aggregated_table

唯一的问题是备份的恢复非常繁琐，我没有找到让它自动识别模式或至少自动识别列名的方法。

在我的情况下，表格有300多列，我不想像这样指定每一列：

SELECT
  backup_time,
  data.id as id,
  TIMESTAMP(data.some_time_value) as some_time_value,
  TIMESTAMP(data.some_string_value, &quot;) as some_string_value,
  --...另外300多列...
FROM my_backup WHERE backup_time = xxx

我已经长时间研究了BigQuery上的JSON文档，但我找不到适合我的情况的功能...

是否有人有解决方案？也许像这样使用JSON文件工作是一种反模式，还有其他方法可以做到吗？

非常感谢！

我期望有一个类似于展开JSON的功能，基本上可以逆转我为备份所做的打包。我知道Python中的Pandas库可以从JSON中填充表格，但我想找到在BigQuery SQL中执行此操作的方法。

英文:

Edit: Thanks to @NickW I was shown that with the Native Snapshot function of BigQuery the whole question becomes irrelevant ... it is better avoid reinventing the weel but use what is supported by the framework insted! I'm happy with the solution.

I have the following problem:
I want to create backups that are schema-agnostic, meaning, if the schema (of my raw data) is changed it doesn't fry up my backups.

My idea was to store it into a JSON file, and then query it, like this (in BigQuery SQL):

WITH aggregated_table as (
  -- Aggregate all data into one JSON file
  SELECT id ,TO_JSON(row_struct) AS data
  -- Extract each row as a STRUCT
  FROM mytable row_struct)

-- Backups use time as primary key
SELECT CURRENT_TIMESTAMP() as backup_time, id, data from aggregated_table

The only issue is that the recovery of the backups is really tedious and I haven't found a way to make it recognize the schema, or at least the column names by itself.

In my case the table has 300+ columns and I don't want to specify every single one like this:

SELECT
  backup_time,
  data.id as id,
  TIMESTAMP(data.some_time_value) as some_time_value,
  TIM(data.some_string_value, &quot;) as some_string_value,
  --...300 more columns ...
FROM my_backup WHERE backup_time = xxx

I have looked at the JSON documentations on BigQuery for a long time, but I couldn't find the functionality that fits my case ...

Does anyone have a solution? Maybe working with JSON files like this is an antipattern and there is another way to do it?

Many thanks!

I expected a function like flatten JSON that basically reverses my packing I did for the backup. I know the Pandas library in Python can fill table from JSONs but I want to find a way to do it in BigQuery SQL.

答案1

得分: 0

不要自行创建解决方案，我建议从现有的一些内置过程开始（如表格导出、快照等）。

英文:

Comment converted to answer…

Instead of rolling your own solution, I would start with one of the processes available out-of-the-box (table export, snapshots, etc)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在BigQuery中创建无模式架构的备份 / 通过查询从JSON类型创建表

问题

答案1

将地图编组为JSON时丢失数据。

Oracle SQL排除列为空的行

Golang SQL查询语法验证器

CSV缺少第一行的JQ原始输入

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论