英文:
Fact Table and Grain - Repeating Measures?
问题
以下是您要翻译的内容:
假设我有如下所示的表格。我正在尝试创建正确的事实表。
如果我将这两个表格合并成一个事实表,那么每个销售都会重复计算销售指标,因为每个销售至少包含两个项目(管理费、清洁等等 - 项目数量可能不同),这些项目会影响总销售额。对于汽车1,总销售额将为5000 + 50 + 100。我不能简单地聚合,因为销售需要逐项列出。
我是否需要创建两个事实表,还是需要以完全不同的方式来处理?我倾向于以1:n的方式将Car Sale Table与Charge Table关联起来。
您有什么建议?
Car Sale Table
car_id | sale | currency |
---|---|---|
1 | 5000 | USD |
2 | 7000 | USD |
Charge Table
charge_id | type | amount | currency | car_id |
---|---|---|---|---|
14 | admin fee | 50 | USD | 1 |
15 | cleaning | 100 | USD | 1 |
22 | parking | 10 | USD | 2 |
25 | cleaning | 70 | USD | 2 |
英文:
Suppose I have tables as indicated below. I'm trying to create correct fact tables.
If I combine both tables into one fact table, I would be repeating the sales measure since each sale contains at least two items (admin fee, cleaning and so on - number of items can vary) that contribute to the overall sale. For car 1, the overall sale would therefore be 5000 + 50 + 100. I cannot simply aggregate either because sales needs to be itemized.
Do I need to create two fact tables, or do I need to approach this in an altogether different manner? I'm tempted to link Car Sale Table to Charge Table in a 1:n fashion.
Do you have any suggestions?
Car Sale Table
car_id | sale | currency |
---|---|---|
1 | 5000 | USD |
2 | 7000 | USD |
Charge Table
charge_id | type | amount | currency | car_id |
---|---|---|---|---|
14 | admin fee | 50 | USD | 1 |
15 | cleaning | 100 | USD | 1 |
22 | parking | 10 | USD | 2 |
25 | cleaning | 70 | USD | 2 |
答案1
得分: 1
为了去规范化度量值(这里是 sale
),这是一个非常糟糕的想法(正如您所指出的),这将导致总计算失败。
但是您无需以这种方式合并两个事实表。
如果您的主要目标是按车辆计算摘要信息,只需添加一个新的 charge_id
(假设为 100 - 其中 type = sale
),并将数据添加到第二个表中。
因此,事实表将包含 car_id = 1
的三行
charge_id type amount
14 admin fee 50
15 cleaning 100
100 sales 5000
第一个表将不再需要。
总成本 计算将是对 car_id
按 amount
进行简单的汇总。
您还会想要添加一些其他属性,例如预订和有效时间戳。
英文:
To denormalize the measures (here sale
) is a very bad idea (as you noted) this will cause the sum calculation to fail.
But you do not need to combine you two fact tables in this way.
If your main goal is to calculate summaries per car, simple add a new charge_id
(say 100 - with type = sale
) and add the data to the second table.
So the fact table will contain three rows for the car_id = 1
charge_id type amount
14 admin fee 50
15 cleaning 100
100 sales 5000
The first table will be not needed.
The total cost calculation will be a plain aggregation of the amount
per car_id
.
You'll want to add some other attributes such as booking and validity timestamps.
答案2
得分: 0
如果您正在创建一个 Fact_Sales_Table,您应该对数据进行反规范化处理并合并。数据建模在某种程度上是主观的,基于需求。
如果您想要一个销售事实表,也许可以考虑一个描述逐项销售的“销售类型维度”。因此,在您的主要事实表中,您只需要包括每个车辆交易的每个销售项目的销售金额。
除非需要不同的粒度或两个不同的事实表的要求,否则似乎不需要两个事实表?
也许在建模工作的一部分尝试这个:
另外,也许可以将 dim_sale 更改为其他名称(我只是匆匆提出一个想法)?我之所以采用这种方法,是因为我认为数据可能不仅仅是您提供的表/字段。
另一个示例是将“汽车”交易作为一种类型添加的想法:
英文:
If you are creating a Fact_Sales_Table you should denormalize the data and combine. Data modeling is somewhat subjective and based on the requirements.
If you want one sales fact table perhaps think of a Sales Type Dimension that describes the itemized line item sale. Therefore, in your main fact table you would only really include the sales amount of each line item per overall car transaction.
Two fact tables seem unnecessary unless there is a need for different grain or requirement for two different fact tables?
Maybe try this as part of your modeling effort:
Also maybe change dim_sale to another name (I just threw something together)? I'm going this route as I assume there's more to the data than the tables/fields you provided.
Another example with the idea of adding the "car" transaction as a type:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论