2023年6月12日 23:56:04go评论101阅读模式

英文:

Is redundant data an acceptable trade-off in a normalized database structure?

问题

在SQL中，我考虑以下问题。

我有一组A_id和一组B_id。

唯一A_id的数量约为1,000个
唯一B_id的数量约为1,000,000个

思路是对于每个A_id，都有一个B_id列表，可能在这个列表中有许多B_id（多对多关系）。

我可以简单地将它们存储在以下格式中：

| a_id | b_ids |
| 1 | '1,2,3,4,5' |
| 2 | '1,2,4,5' |
| 3 | '1' |
| 4 | '1,2' |
| 5 | '3,4' |
| 6 | '2,3' |
...

然而，我读到规范化，即简单地执行以下操作：

| a_id | b_id |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 2 | 1 |
...

是更好的做法，但我担心拥有大量行的影响（即1,000,000,000+）。

我理解两者的缺点，但哪种做法更好的权衡呢？

英文:

In SQL I'm considering the following problem.

I have a list of A_ids and a list of B_ids.

the number of unique A_ids ~ 1.000s
the number of unique B_ids ~ 1.000.000s

The idea is that I for each A_id have a list of B_ids, with potentially many B_ids in this list (many to many).

I could simply store them in the format

| a_id | b_ids |
| 1 | &#39;1,2,3,4,5&#39; |
| 2 | &#39;1,2,4,5&#39; |
| 3 | &#39;1&#39; |
| 4 | &#39;1,2&#39; |
| 5 | &#39;3,4&#39; |
| 6 | &#39;2,3&#39; |
...

I however read that normalization i.e. simply doing:

| a_id | b_id |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 2 | 1 |
...

is better practice but I fear the impact of having a huge amount of rows (i.e. 1.000.000.000+)

I understand the drawbacks with either but what is the better tradeoff?

答案1

得分: 1

规范化是要遵循的路径

对于现代数据库管理系统来说，这并不是特别多的行数
正如您会适当地为表创建索引，您只会访问由任何查询实际使用的表中的行，而不是执行完整的表扫描（除非您的查询需要进行完整的表扫描）

英文:

Normalisation is the route to follow

For a modern DBMS, that’s not a particularly large number of rows
As you would index the table appropriately, you would only access the rows in the table actually used by any query rather than do a full table scan (unless your query requires a full table scan)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

冗余数据在规范化的数据库结构中是否可接受？

问题

答案1

How to create trigger after alter table that makes copy of added column with suffix _vis to another table?

计算所有访问的最早和最晚访问之间的总时间

JOOQ 物化视图

TYPO3 SQL错误: 字段 ‘tx_imagezoom_set’ 没有默认值

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。