如何在Django和PostgreSQL中安全且原子地减少计数器?

huangapple go评论65阅读模式
英文:

How to safely and atomically decrement a counter with Django and PostgreSQL?

问题

我一直在研究PostgreSQL事务隔离以及它与Django的transaction.atomic()之间的关系(例如,此文章PostgreSQL文档),但我对这个主题了解甚少,不确定我所读的是否理解正确。

我们有一个使用PostgreSQL支持的Django应用程序,涉及到配额对象。简化来说,它就是这样的:

class Quota(models.Model):
    obj = models.OneToOneField(AnotherModel)
    count = models.PositiveIntegerField()

这个实例控制着对obj实例可以执行某个操作的次数。count被初始化为某个数字,只会递减直到达到零。

任意数量的进程/线程可以并发执行这些操作。基本上,我们需要原子地递减(使用UPDATE)单个数据库行的count,而不会出现死锁,也不会有两个进程/线程同时尝试将其从100递减到99的情况。

我天真的方法可能是这样的:

with transaction.atomic():
    cursor = connection.cursor()
    cursor.execute('SET TRANSACTION ISOLATION LEVEL SERIALIZABLE')
    Quota.objects.filter(obj=instance).update(count=F('count')-1)

但是,我不确定这是否会受到链接文章中提到的问题的影响:

如果在提交时数据库无法确定事务是否可以与其他事务的读/写序列执行,那么它将失败,并显示django.db.DatabaseError。即使它们更新了不同的行,也可能发生这种情况。

所有执行对同一obj的操作的进程/线程将递减同一行的同一列,所以...也许?我实际上不知道PostgreSQL在“确定事务是否可以按序执行”的过程中涉及了什么。

另一种方法可能是:

with transaction.atomic():
    Quota.objects.select_for_update().filter(obj=instance).update(count=F('count')-1)

这似乎是针对行级别的锁定,而我的理解是不需要更改隔离级别,但我不知道这是否足以正确处理并发操作。

在这里,哪种方法更可取,是否仍然需要进行一些修改以确保原子性和避免死锁?我们可以使用像python-redis-lock这样的东西来防止Django视图级别的并发DB操作,但这似乎更适合在DB级别执行。

英文:

I've been reading up on PostgreSQL transaction isolation and how that relates to Django's transaction.atomic() (e.g. this article, PostgreSQL docs), but I'm far from fluent in this topic and I'm not sure I understand what I've read.

We've got a PostgreSQL-backed Django app that involves quota objects. Simplified, it's just this:

class Quota(models.Model):
    obj = models.OneToOneField(AnotherModel)
    count = models.PositiveIntegerField()

An instance of this controls how many times a certain operation can be performed against the obj instance. count is initialized to a certain number, and will only ever decrement until it hits zero.

Any number of processes/threads can concurrently perform these operations. Basically, we need to atomically decrement (with UPDATE) the count of a single database row without deadlocking and without two processes/threads ever e.g. starting with a count of 100 and both trying to decrement it to 99.

My naive approach would be this:

with transaction.atomic():
    cursor = connection.cursor()
    cursor.execute('SET TRANSACTION ISOLATION LEVEL SERIALIZABLE')
    Quota.objects.filter(obj=instance).update(count=F('count')-1)

However, I'm not sure if this is subject to this issue, from the linked article:

> if at COMMIT the database cannot determine that the transaction could have been performed serially with respect to the read/writes of other transactions, then it will fail with a django.db.DatabaseError. This can happen even if they updated different rows.

All the processes/threads performing operations against the same obj would be decrementing the same column of the same row, so... maybe? I don't actually know what's involved in PostgreSQL "determin[ing] that the transaction could have been performed serially".

An alternate approach could be:

with transaction.atomic():
    Quota.objects.select_for_update().filter(obj=instance).update(count=F('count')-1)

This seems to do row-level locking, and my understanding is that the isolation level change isn't needed, but I don't know if this is sufficient for correct handling of concurrent operations.

Is one of these approaches preferrable here, and are some modifications still necessary to guarantee atomicity and deadlock avoidance? We could use something like python-redis-lock to also prevent concurrent DB operations at the Django view level, but this feels like a more natural fit to do at the DB level.

答案1

得分: 3

我无法告诉您如何在Django中操作,但我可以在SQL层面上解释。

只有修改单个行时,您永远不会出现死锁。您可能会遇到的唯一情况是活锁,其中一个更新事务必须等待前一个事务提交。这种活锁是无法避免的,这是数据库串行化相同行的修改的方式。

唯一可能发生死锁的方式是如果多个数据库事务尝试以不同的顺序锁定相同的对象(复数!)。

以下是一些可以用来避免问题的提示:

  • 使数据库事务尽可能短暂,以便没有人必须长时间等待锁。这也减少了死锁的风险。

  • 不要在单个事务中修改超出维护一致性所需的数据。您修改(锁定)的行数越多,死锁的风险就越大。

  • 在提交之前(或尽可能晚地)更新计数器,以便行锁定的时间尽可能短。如果您始终将计数器的更新作为最后一项活动进行,您就绝对不会因为该更新而陷入死锁!

  • 如果您希望确保某列的值永远不超过某个值,请在该列上使用检查约束。

英文:

I cannot tell you what to do in Django, but I can explain it on the SQL level.

You can never deadlock by modifying only a single row. All you can ever get is a live lock, where one updating transaction has to wait until the previous one commits. This live lock cannot be avoided, it is the way that the database serializes modifications of the same row.

The only way you can get a deadlock is if several database transactions try to lock the same objects (plural!) in a different order.

Here are a few tips that you can use to avoid problems:

  • Keep your database transactions as short as possible, so that nobody has to wait long for a lock. That also reduces the risk of deadlocks.

  • Don't modify more data in a single transaction than is absolutely necessary for consistency. The more rows you modify (lock), the greater the risk of a deadlock.

  • Update the counter as the last activity before you commit (or as late as possible), so that the row is locked as short as possible. If you always update the counter as the last activity, you can never run into a deadlock on account of that update!

  • Use a check constraint on the column if you want to make sure that is never exceeds a certain value.

答案2

得分: 2

从我的经验来看,使用隔离级别serializable确实是一个不错的选择,并且似乎是避免死锁的首选方式,根据PostgreSQL文档

以以下代价为代价:

> 这种监控不会引入比可重复读更多的阻塞,但是监控会带来一些开销,检测可能引发序列化异常的条件会触发序列化失败。

> 读/写依赖关系的监控具有一定的成本,以及在事务由于序列化失败而终止时重新启动事务也有成本,

但通常仍然比select for update好:

> 但是考虑到使用显式锁和SELECT FOR UPDATE或SELECT FOR SHARE所涉及的成本和阻塞,可串行化事务对某些环境来说是最佳性能选择。

因此,最好使用serializable但这意味着您需要在客户端实现一个“重试”机制,在SQL中执行减法操作(就像您在提交时从PostgreSQL得到40001错误时重新运行您的代码一样。

英文:

From my experience, using isolation level serializable is the way to go indeed, and seems to be the preferred way to avoid deadlocks by the postgres docs:

at the cost of:

> This monitoring does not introduce any blocking beyond that present in repeatable read, but there is some overhead to the monitoring, and detection of the conditions which could cause a serialization anomaly will trigger a serialization failure.

> The monitoring of read/write dependencies has a cost, as does the restart of transactions which are terminated with a serialization failure,

but still often better than select for update:

> but balanced against the cost and blocking involved in use of explicit locks and SELECT FOR UPDATE or SELECT FOR SHARE, Serializable transactions are the best performance choice for some environments.

So you'd be better off using serializable, but that implies that you implement a "retry" mechanism on the client-side, which in the case of a decrement implemented in sql (as you did with F(count) - 1) should be just re-running your code if you got a 40001 error from postgres during commit.

答案3

得分: -1

你可以尝试使用数据库约束,这样保持数据一致性的责任就由数据库负责了。
Django支持约束,所以你可以查看文档并进行一些实验。

https://docs.djangoproject.com/en/4.2/ref/models/options/#django.db.models.Options.constraints

在你的情况下,应该是这样的:

class Quota(models.Model):
    obj = models.OneToOneField(AnotherModel)
    count = models.PositiveIntegerField()

    class Meta:
        constraints = [
            models.CheckConstraint(check=models.Q(count__gte=0), name="count_gte_0"),
        ]

这样,当你保存更改时,数据库会检查你的数据,如果失败就会引发错误。

英文:

You can try database constraints, that way the responsibility for keeping your data consistent will be taken care by your database.
Django have support for constraints, so you can check documentation and experiment a little with it

https://docs.djangoproject.com/en/4.2/ref/models/options/#django.db.models.Options.constraints

In your case it should be something like

class Quota(models.Model):
    obj = models.OneToOneField(AnotherModel)
    count = models.PositiveIntegerField()

        class Meta:
        constraints = [
            models.CheckConstraint(check=models.Q(count__gte=0), name="count_gte_0"),
        ]

That way, when you save your changes, database will check your data and raise error if it fails

答案4

得分: -1

类似这样可以阻止计数变为负数

```python
class Quota(models.Model):
    obj = models.OneToOneField(AnotherModel)
    count = models.PositiveIntegerField()
    
    def decrement(self):
        result = Quota.objects.select_for_update().filter(id=self.id,count__gt=0).update(F('count')-1)

现在您可以检查结果是否为0,如果是,则过滤器失败,可能没有计数大于0的行。(因此基本上永远不会变为负数)

如果它是1,那么数据库找到了一行并执行了减少1的操作。

每次调用decrement时,您的代码会等待select_for_update锁,直到释放该行(如果该行上有锁),因此可能会发生死锁,您可以使用nowait=True选项,但要小心引发的异常。(文档

您还可以将整个方法包装在原子事务中,甚至检查是否出现了问题,以将其恢复到上一点。


<details>
<summary>英文:</summary>

something like this will stop your count to go negative.


```python
class Quota(models.Model):
    obj = models.OneToOneField(AnotherModel)
    count = models.PositiveIntegerField()
    
    def decrement(self):
        result = Quota.objects.select_for_update().filter(id=self.id,count__gt=0).update(F(&#39;count&#39;)-1)

and now you can check if result is 0 then the filter has failed and maybe there is no row with count bigger than 0.( so basically never gets below zero)

and if its 1 then database found a row and performed a decrement by 1.

every time you call decrement your code waits to for select_for_update lock until it is released that row ( if there is a lock on it) so deadlock might happen you can use nowait=True option but beware of raised exceptions.(doc)

you can also wrap this whole method in an atomic transaction and even check if anything went wrong to restore it to the last point.

huangapple
  • 本文由 发表于 2023年6月5日 21:47:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76407082.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定