事实表和粒度 – 重复度量?

huangapple go评论55阅读模式
英文:

Fact Table and Grain - Repeating Measures?

问题

以下是您要翻译的内容:

假设我有如下所示的表格。我正在尝试创建正确的事实表。

如果我将这两个表格合并成一个事实表,那么每个销售都会重复计算销售指标,因为每个销售至少包含两个项目(管理费、清洁等等 - 项目数量可能不同),这些项目会影响总销售额。对于汽车1,总销售额将为5000 + 50 + 100。我不能简单地聚合,因为销售需要逐项列出。

我是否需要创建两个事实表,还是需要以完全不同的方式来处理?我倾向于以1:n的方式将Car Sale Table与Charge Table关联起来。

您有什么建议?

Car Sale Table

car_id sale currency
1 5000 USD
2 7000 USD

Charge Table

charge_id type amount currency car_id
14 admin fee 50 USD 1
15 cleaning 100 USD 1
22 parking 10 USD 2
25 cleaning 70 USD 2
英文:

Suppose I have tables as indicated below. I'm trying to create correct fact tables.

If I combine both tables into one fact table, I would be repeating the sales measure since each sale contains at least two items (admin fee, cleaning and so on - number of items can vary) that contribute to the overall sale. For car 1, the overall sale would therefore be 5000 + 50 + 100. I cannot simply aggregate either because sales needs to be itemized.

Do I need to create two fact tables, or do I need to approach this in an altogether different manner? I'm tempted to link Car Sale Table to Charge Table in a 1:n fashion.

Do you have any suggestions?

Car Sale Table

car_id sale currency
1 5000 USD
2 7000 USD

Charge Table

charge_id type amount currency car_id
14 admin fee 50 USD 1
15 cleaning 100 USD 1
22 parking 10 USD 2
25 cleaning 70 USD 2

答案1

得分: 1

为了去规范化度量值(这里是 sale),这是一个非常糟糕的想法(正如您所指出的),这将导致总计算失败。

但是您无需以这种方式合并两个事实表。

如果您的主要目标是按车辆计算摘要信息,只需添加一个新的 charge_id(假设为 100 - 其中 type = sale),并将数据添加到第二个表中。

因此,事实表将包含 car_id = 1 的三行

charge_id 	type 	    amount
14 	        admin fee 	50
15 	        cleaning 	100
100         sales       5000

第一个表将不再需要。

总成本 计算将是对 car_idamount 进行简单的汇总。

您还会想要添加一些其他属性,例如预订和有效时间戳。

英文:

To denormalize the measures (here sale) is a very bad idea (as you noted) this will cause the sum calculation to fail.

But you do not need to combine you two fact tables in this way.

If your main goal is to calculate summaries per car, simple add a new charge_id (say 100 - with type = sale) and add the data to the second table.

So the fact table will contain three rows for the car_id = 1

charge_id 	type 	    amount
14 	        admin fee 	50
15 	        cleaning 	100
100         sales       5000

The first table will be not needed.

The total cost calculation will be a plain aggregation of the amount per car_id.

You'll want to add some other attributes such as booking and validity timestamps.

答案2

得分: 0

如果您正在创建一个 Fact_Sales_Table,您应该对数据进行反规范化处理并合并。数据建模在某种程度上是主观的,基于需求。

如果您想要一个销售事实表,也许可以考虑一个描述逐项销售的“销售类型维度”。因此,在您的主要事实表中,您只需要包括每个车辆交易的每个销售项目的销售金额。

除非需要不同的粒度或两个不同的事实表的要求,否则似乎不需要两个事实表?

也许在建模工作的一部分尝试这个:

另外,也许可以将 dim_sale 更改为其他名称(我只是匆匆提出一个想法)?我之所以采用这种方法,是因为我认为数据可能不仅仅是您提供的表/字段。

另一个示例是将“汽车”交易作为一种类型添加的想法:

英文:

If you are creating a Fact_Sales_Table you should denormalize the data and combine. Data modeling is somewhat subjective and based on the requirements.

If you want one sales fact table perhaps think of a Sales Type Dimension that describes the itemized line item sale. Therefore, in your main fact table you would only really include the sales amount of each line item per overall car transaction.

Two fact tables seem unnecessary unless there is a need for different grain or requirement for two different fact tables?

Maybe try this as part of your modeling effort:

事实表和粒度 – 重复度量?

Also maybe change dim_sale to another name (I just threw something together)? I'm going this route as I assume there's more to the data than the tables/fields you provided.

Another example with the idea of adding the "car" transaction as a type:

事实表和粒度 – 重复度量?

huangapple
  • 本文由 发表于 2023年5月14日 06:24:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76245109.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定