英文:
Database design in BigQuery
问题
我有一个名为"Acquisition"的主表,其中有多个列将引用其他表(例如"Source"、"Application"等)。例如,"Source"可能有多个可能的值,这些值将在"Acquisition"表的多行中使用。让我有点困扰的是,"Acquisition"表的行返回的数据可能是这样的:
id > 1; value > 23.4; source_id > 1; application_id > 3; platform_id > 1; country_id > 1; 等等。
你认为有没有其他设计方式可以使其更易读/用户友好?
以下是模式代码的摘录:
acquisitionSchema = bigquery.Schema {
&bigquery.FieldSchema{Name: "id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "value", Required: true, Type: bigquery.FloatFieldType},
&bigquery.FieldSchema{Name: "source_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "application_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "platform_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "country_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "adtype_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "date", Required: true, Type: bigquery.dateFieldType},
&bigquery.FieldSchema{Name: "download", Required: false, Type: bigquery.IntegerFieldType}
}
sourceSchema = bigquery.Schema {
&bigquery.FieldSchema{Name: "id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "value", Required: true, Type: bigquery.StringFieldType},
}
我考虑直接放置源、平台等的值,但如果我通过API从多个来源获取数据,可能会变得混乱,除非我在代码中进行所有必要的控制。
谢谢!
英文:
I have a main table called "Acquisition" with multiple columns thatwould be referencing other tables (ex: "Source", "Application", etc. - For example, "Source" would have multiple possible values that wouldbe used in multiple rows of the "Acquisition" table). What bothers mea bit is that the way is that the rows of the "Acquisition" tablewould return datas that would like this:
id > 1 ; value > 23.4 ; source_id > 1 ; application_id > 3 ;platform_id > 1 ; country_id > 1 ; etc.
Do you think there's another way to design it to make it more readable / user-friendly ?
Here's an extract of the code of the schema:
acquisitionSchema = bigquery.Schema {
&bigquery.FieldSchema{Name: "id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "value", Required: true, Type: bigquery.FloatFieldType},
&bigquery.FieldSchema{Name: "source_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "application_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "platform_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "country_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "adtype_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "date", Required: true, Type: bigquery.dateFieldType},
&bigquery.FieldSchema{Name: "download", Required: false, Type: bigquery.IntegerFieldType} }
sourceSchema = bigquery.Schema {
&bigquery.FieldSchema{Name: "id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "value", Required: true, Type: bigquery.StringFieldType},
}
I thought of directly putting the value of the source, platform, etc. but it might get messy as I get my data from multiple sources through APIs unless I make all the necessary controls in my code.
Thanks !
答案1
得分: 2
通常我们会创建一个具有两列(id, name)
的RECORD
。
-country
|id
|name
这样,在我们的查询中,我们可以使用country.id
按整数查询,或者使用country.name
快速检查显示值。
由于现在存储空间便宜,我们可以负担得起在每一列中存储文字表示。由于BQ的设计是追加模式,并且我们通常只读取最新的行,如果name
同时发生了更改,最新的行已经包含了最新的值。使用LAST_VALUE
函数,我们始终可以选择持有最后一个name
的最后一条记录。
英文:
Usually we do a RECORD
that has two columns (id,name)
-country
|id
|name
this way in our query we can use country.id
to query by integer, or country.name
to display the value for quick inspection.
Since nowadays storage is cheap, we can afford storing the literal representation in every column. Since BQ is append-only by design, and we usually read most recent row, that already contains the fresh value if the name
meanwhile suffered a change. Using LAST_VALUE
function we can always pick the last record that holds the last name
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论