英文:
Is redundant data an acceptable trade-off in a normalized database structure?
问题
在SQL中,我考虑以下问题。
我有一组A_id和一组B_id。
- 唯一A_id的数量约为1,000个
- 唯一B_id的数量约为1,000,000个
思路是对于每个A_id,都有一个B_id列表,可能在这个列表中有许多B_id(多对多关系)。
我可以简单地将它们存储在以下格式中:
| a_id | b_ids |
| 1 | '1,2,3,4,5' |
| 2 | '1,2,4,5' |
| 3 | '1' |
| 4 | '1,2' |
| 5 | '3,4' |
| 6 | '2,3' |
...
然而,我读到规范化,即简单地执行以下操作:
| a_id | b_id |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 2 | 1 |
...
是更好的做法,但我担心拥有大量行的影响(即1,000,000,000+)。
我理解两者的缺点,但哪种做法更好的权衡呢?
英文:
In SQL I'm considering the following problem.
I have a list of A_ids and a list of B_ids.
- the number of unique A_ids ~ 1.000s
- the number of unique B_ids ~ 1.000.000s
The idea is that I for each A_id have a list of B_ids, with potentially many B_ids in this list (many to many).
I could simply store them in the format
| a_id | b_ids |
| 1 | '1,2,3,4,5' |
| 2 | '1,2,4,5' |
| 3 | '1' |
| 4 | '1,2' |
| 5 | '3,4' |
| 6 | '2,3' |
...
I however read that normalization i.e. simply doing:
| a_id | b_id |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 2 | 1 |
...
is better practice but I fear the impact of having a huge amount of rows (i.e. 1.000.000.000+)
I understand the drawbacks with either but what is the better tradeoff?
答案1
得分: 1
规范化是要遵循的路径
- 对于现代数据库管理系统来说,这并不是特别多的行数
- 正如您会适当地为表创建索引,您只会访问由任何查询实际使用的表中的行,而不是执行完整的表扫描(除非您的查询需要进行完整的表扫描)
英文:
Normalisation is the route to follow
- For a modern DBMS, that’s not a particularly large number of rows
- As you would index the table appropriately, you would only access the rows in the table actually used by any query rather than do a full table scan (unless your query requires a full table scan)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论