在BigQuery中进行数据库设计

huangapple go评论72阅读模式
英文:

Database design in BigQuery

问题

我有一个名为"Acquisition"的主表,其中有多个列将引用其他表(例如"Source"、"Application"等)。例如,"Source"可能有多个可能的值,这些值将在"Acquisition"表的多行中使用。让我有点困扰的是,"Acquisition"表的行返回的数据可能是这样的:

id > 1; value > 23.4; source_id > 1; application_id > 3; platform_id > 1; country_id > 1; 等等。

你认为有没有其他设计方式可以使其更易读/用户友好?

以下是模式代码的摘录:

acquisitionSchema = bigquery.Schema {
&bigquery.FieldSchema{Name: "id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "value", Required: true, Type: bigquery.FloatFieldType},
&bigquery.FieldSchema{Name: "source_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "application_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "platform_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "country_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "adtype_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "date", Required: true, Type: bigquery.dateFieldType},
&bigquery.FieldSchema{Name: "download", Required: false, Type: bigquery.IntegerFieldType}
}

sourceSchema = bigquery.Schema {
&bigquery.FieldSchema{Name: "id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "value", Required: true, Type: bigquery.StringFieldType},
}

我考虑直接放置源、平台等的值,但如果我通过API从多个来源获取数据,可能会变得混乱,除非我在代码中进行所有必要的控制。

谢谢!

英文:

I have a main table called "Acquisition" with multiple columns thatwould be referencing other tables (ex: "Source", "Application", etc. - For example, "Source" would have multiple possible values that wouldbe used in multiple rows of the "Acquisition" table). What bothers mea bit is that the way is that the rows of the "Acquisition" tablewould return datas that would like this:

id > 1 ; value > 23.4 ; source_id > 1 ; application_id > 3 ;platform_id > 1 ; country_id > 1 ; etc.

Do you think there's another way to design it to make it more readable / user-friendly ?

Here's an extract of the code of the schema:

acquisitionSchema = bigquery.Schema {
    &bigquery.FieldSchema{Name: "id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "value", Required: true, Type: bigquery.FloatFieldType},
    &bigquery.FieldSchema{Name: "source_id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "application_id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "platform_id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "country_id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "adtype_id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "date", Required: true, Type: bigquery.dateFieldType},
    &bigquery.FieldSchema{Name: "download", Required: false, Type: bigquery.IntegerFieldType}   } 

sourceSchema = bigquery.Schema {
    &bigquery.FieldSchema{Name: "id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "value", Required: true, Type: bigquery.StringFieldType},
}

I thought of directly putting the value of the source, platform, etc. but it might get messy as I get my data from multiple sources through APIs unless I make all the necessary controls in my code.

Thanks !

答案1

得分: 2

通常我们会创建一个具有两列(id, name)RECORD

-country
 |id
 |name

这样,在我们的查询中,我们可以使用country.id按整数查询,或者使用country.name快速检查显示值。

由于现在存储空间便宜,我们可以负担得起在每一列中存储文字表示。由于BQ的设计是追加模式,并且我们通常只读取最新的行,如果name同时发生了更改,最新的行已经包含了最新的值。使用LAST_VALUE函数,我们始终可以选择持有最后一个name的最后一条记录。

英文:

Usually we do a RECORD that has two columns (id,name)

-country
 |id
 |name

this way in our query we can use country.id to query by integer, or country.name to display the value for quick inspection.

Since nowadays storage is cheap, we can afford storing the literal representation in every column. Since BQ is append-only by design, and we usually read most recent row, that already contains the fresh value if the name meanwhile suffered a change. Using LAST_VALUE function we can always pick the last record that holds the last name.

huangapple
  • 本文由 发表于 2017年3月24日 04:26:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/42986325.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定