优化 Django 模型以适应大型数据表

huangapple go评论58阅读模式
英文:

Optimizing the Django model for large tables

问题

考虑一个拥有100万用户和10亿交易的银行。

所有10亿交易会放在一个表中吗?

1-20亿条记录是正常的还是太多了?

如果我们想要查看一个用户的交易,我们每次都要搜索整个表吗?
是这样运作的吗,还是有更好的方法?

英文:

Consider a bank with 1 million users and 1 billion transactions.

Will all one billion transactions be placed in one table?

Is 1-2 billion records normal or too much?

If we want to see the transactions of a user, do we search this entire table every time?
Is this the way it works or is there a better way?

答案1

得分: 2

No, 数据库保留索引。索引允许在数据库中高效搜索。对于ForeignKey,它会自动创建索引。

索引的实现取决于数据库,但通常是类似B树的树状结构,这意味着我们可以在*O(log n)*时间内找到要提取的行,这在记录数量较多时性能相当好。因此,数据库无需进行线性搜索。

如果您想要检索用户过去n年的索引,通常会创建一个复合索引,使得模型如下:

from django.conf import settings
from django.db import models

class Transaction(models.Model):
    user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
    timestamp = models.DateTimeField(auto_now_add=True)

    class Meta:
        indexes = [
            models.<b>Index(</b>fields=['user', 'timestamp']<b>)</b>,
        ]

这将timestampuser字段合并为一个索引,使得检索给定用户过去一年的记录通常非常高效。

数据库被设计为尽可能高效地检索数据,因此只有在最后的情况下才会执行“线性扫描”,即数据库必须枚举所有记录。

然而,通常隐私规则要求银行在某个时候删除旧记录。如果我记得没错,在欧盟,银行被要求在一定年限后删除交易记录(尽管它们也被要求在某些年限内保留这些记录)。

英文:

> If we want to see the transactions of a user, do we search this entire table every time? Is this the way it works or is there a better way?

No, a database keeps indexes. Indexes allow searching in a database efficiently. For ForeignKeys, it makes indexes automatically.

The implementation of indexes depends on the database, but typically it is a tree-like structure like a B-tree, this means that we detect which rows to fetch in O(log n), which thus does scale quite well with the number of records. The database thus has not to do a linear search.

If you want to retrieve the indexes of a user for the last n years, you typically make a composite index, so the model looks like:

<pre><code>from django.conf import settings

class Transaction(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
timestamp = models.DateTimeField(auto_now_add=True)

class Meta:
    indexes = [
        models.&lt;b&gt;Index(&lt;/b&gt;fields=[&#39;user&#39;, &#39;timestamp&#39;]&lt;b&gt;)&lt;/b&gt;,
    ]&lt;/code&gt;&lt;/pre&gt;

This combines the timestamp and user fields in an index, making queries to retrieve the records for the last year for a given user usually quite efficient.

Databases are designed to try to retrieve data as efficient as possible, and thus will thus only as a last resort do a "linear scan" where the database has to enumerate over all records.

often however privacy rules require the bank to delete old records anyway. If I recall correctly, in the EU, banks are required to remove the transactions after a certain number of years (although they are required to keep these for a certain number of years as well).

huangapple
  • 本文由 发表于 2023年5月14日 20:02:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76247381.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定