冗余数据在规范化的数据库结构中是否可接受?

huangapple go评论71阅读模式
英文:

Is redundant data an acceptable trade-off in a normalized database structure?

问题

在SQL中,我考虑以下问题。

我有一组A_id和一组B_id。

  • 唯一A_id的数量约为1,000个
  • 唯一B_id的数量约为1,000,000个

思路是对于每个A_id,都有一个B_id列表,可能在这个列表中有许多B_id(多对多关系)。

我可以简单地将它们存储在以下格式中:

| a_id | b_ids |
| 1 | '1,2,3,4,5' |
| 2 | '1,2,4,5' |
| 3 | '1' |
| 4 | '1,2' |
| 5 | '3,4' |
| 6 | '2,3' |
...

然而,我读到规范化,即简单地执行以下操作:

| a_id | b_id |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 2 | 1 |
...

是更好的做法,但我担心拥有大量行的影响(即1,000,000,000+)。

我理解两者的缺点,但哪种做法更好的权衡呢?

英文:

In SQL I'm considering the following problem.

I have a list of A_ids and a list of B_ids.

  • the number of unique A_ids ~ 1.000s
  • the number of unique B_ids ~ 1.000.000s

The idea is that I for each A_id have a list of B_ids, with potentially many B_ids in this list (many to many).

I could simply store them in the format

| a_id | b_ids |
| 1 | '1,2,3,4,5' |
| 2 | '1,2,4,5' |
| 3 | '1' |
| 4 | '1,2' |
| 5 | '3,4' |
| 6 | '2,3' |
...

I however read that normalization i.e. simply doing:

| a_id | b_id |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 2 | 1 |
...

is better practice but I fear the impact of having a huge amount of rows (i.e. 1.000.000.000+)

I understand the drawbacks with either but what is the better tradeoff?

答案1

得分: 1

规范化是要遵循的路径

  1. 对于现代数据库管理系统来说,这并不是特别多的行数
  2. 正如您会适当地为表创建索引,您只会访问由任何查询实际使用的表中的行,而不是执行完整的表扫描(除非您的查询需要进行完整的表扫描)
英文:

Normalisation is the route to follow

  1. For a modern DBMS, that’s not a particularly large number of rows
  2. As you would index the table appropriately, you would only access the rows in the table actually used by any query rather than do a full table scan (unless your query requires a full table scan)

huangapple
  • 本文由 发表于 2023年6月12日 23:56:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/76458357.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定