查询速度太慢; prefetch_related 无法解决问题

huangapple go评论74阅读模式
英文:

Queries are too slow; prefetch_related not solving the problem

问题

我们正在使用Django 2.1来开发Speedy Net。我有一些页面,每页显示大约96个用户,并且对于每个用户,我想显示他在Speedy Match上有多少个朋友,并且这些朋友拥有活跃的电子邮件地址。查询检查每个用户是否满足 (self.email_addresses.filter(is_confirmed=True).exists()) 条件:

def has_confirmed_email(self):
    return (self.email_addresses.filter(is_confirmed=True).exists())

对于每个96个用户中的每个用户,它都会运行这个查询 - 每页超过数百次。用于获取用户的查询是 User.objects.all().order_by(<...>),然后对于每个用户,它会运行以下查询:

qs = self.friends.all().prefetch_related("from_user", "from_user__{}".format(SpeedyNetSiteProfile.RELATED_NAME), "from_user__{}".format(SpeedyMatchSiteProfile.RELATED_NAME), "from_user__email_addresses").distinct().order_by('-from_user__{}__last_visit'.format(SiteProfile.RELATED_NAME))

我在用户管理模型中添加了 prefetch_related

def get_queryset(self):
    from speedy.net.accounts.models import SiteProfile as SpeedyNetSiteProfile
    from speedy.match.accounts.models import SiteProfile as SpeedyMatchSiteProfile
    return super().get_queryset().prefetch_related(SpeedyNetSiteProfile.RELATED_NAME, SpeedyMatchSiteProfile.RELATED_NAME, "email_addresses").distinct()

但是将 "email_addresses" 和 "from_user__email_addresses" 添加到 prefetch_related 并没有使页面加载速度更快 - 加载页面大约需要16秒。如果在不检查每个朋友是否有已确认的电子邮件地址的情况下加载页面,加载时间大约为3秒。是否有一种方法可以一次性加载所有用户的电子邮件地址,而不是每次检查一个用户时都加载一次?实际上,我也希望朋友的查询只加载一次,而不是每页加载96次(每个用户加载一次),但是页面加载速度为3秒,所以这并不是太重要。但如果我可以查询朋友表一次,那会更好。

这些查询是由以下代码行引起的(链接):

if ((self.user.has_confirmed_email()) and (step >= self.activation_step)):

这是由 is_active_and_valid 调用的,它又被 get_matching_rank 调用,以检查用户是否与特定用户匹配。这是模型中的 get_friends 方法调用的。

更新#1: 如果我在模型的 def has_confirmed_email(...) 中更改为 return True,那么页面加载速度仅会快3秒(从16秒变为13秒),因此可能在页面中还存在其他与性能相关的问题。

如果我禁用 get_matching_rank 的功能,并用普通的 return 5 替换它,页面加载速度会快得多。但当然,我们需要此函数的功能。也许我们可以在为两个特定用户集合调用此函数时,只缓存该函数的结果几分钟?

更新#2: 我想在用户模型中添加一个布尔字段,如果用户有已确认的电子邮件地址,则为 true。并且每次保存或删除电子邮件地址时都会更新此字段。我知道如何覆盖保存方法,但是当删除电子邮件地址时,如何更新此字段?管理员也可以删除它。

我认为我应该使用信号,比如 post_savepost_delete

英文:

We are using Django 2.1 for Speedy Net. I have pages which display about 96 users per page, and for each user I want to display how many friends he has on Speedy Match, with an active email address. The query checks for each user if (self.email_addresses.filter(is_confirmed=True).exists()) is true:

def has_confirmed_email(self):
    return (self.email_addresses.filter(is_confirmed=True).exists())

For each user of 96 users, it checks all his friends and runs this query - more than hundreds of times per page. The query for fetching the users is User.objects.all().order_by(<...>), and then for each user it checks this query:

qs = self.friends.all().prefetch_related("from_user", "from_user__{}".format(SpeedyNetSiteProfile.RELATED_NAME), "from_user__{}".format(SpeedyMatchSiteProfile.RELATED_NAME), "from_user__email_addresses").distinct().order_by('-from_user__{}__last_visit'.format(SiteProfile.RELATED_NAME))

I added prefetch_related in the User's manager model:

def get_queryset(self):
    from speedy.net.accounts.models import SiteProfile as SpeedyNetSiteProfile
    from speedy.match.accounts.models import SiteProfile as SpeedyMatchSiteProfile
    return super().get_queryset().prefetch_related(SpeedyNetSiteProfile.RELATED_NAME, SpeedyMatchSiteProfile.RELATED_NAME, "email_addresses").distinct()

But adding "email_addresses" and "from_user__email_addresses" to prefetch_related doesn't make the page load faster - it takes about 16 seconds to load the page. When loading the page without checking if each friend has a confirmed email address it takes about 3 seconds to load the page. Is there a way I can load all the email addresses of the users once and not each time a user is checked? Actually I would also like the friends query to be loaded once and not 96 times per page (once for each user), but the page loads in 3 seconds so it doesn't matter that much. But if I could query the friends table once it would have been better.

The queries are caused by the following line (link):

if ((self.user.has_confirmed_email()) and (step >= self.activation_step)):

This is called by is_active_and_valid which is called by get_matching_rank, to check if the user is a match of the specific user. This is called by method get_friends in the model.

Update #1: If I change to return True in def has_confirmed_email(...) in the model, the page loads only 3 seconds faster (13 instead of 16) so there might be more performance-related issues in this page.

If I disable the functionality of get_matching_rank and replace it with a plain return 5, the page loads much faster. But of course we need the functionality of this function. Maybe we can just cache for a few minutes the results of this function when called for sets of two specific users?

Update #2: I want to add a boolean field to the user model, which will be true if the user has a confirmed email address. And this field will be updated each time an email address is saved or deleted. I know how to override the save method, but how do I update this field when an email address gets deleted? It may also be deleted by the admin.

I think I should use signals such as post_save and post_delete.

答案1

得分: 1

为了使预取操作生效,您需要在用户模型上使用它 - 从您提供的内容很难判断您是否正在这样做。

在不预取好友信息的情况下,对每个用户执行 self.friends.all() 将导致查询。要通过预取绕过查询,您可以采取以下一种方法之一:

User.objects.prefetch_related('friends')

或者,您可以使用 Prefetch 对象进一步筛选:

from django.db.models import Prefetch

User.objects.prefetch_related(Prefetch(
    'friends',
    queryset=Friend.objects.filter(is_confirmed=True)
))

使用带有过滤关键字参数的 Count 注释会更快。

from django.db.models import Count, Q

qs = User.objects.annotate(
    friend_count=Count('friends', filter=Q(friends__is_confirmed=True))
)
英文:

For the prefetch to have any effect you would have to be using it on the User model - it's hard to tell if you're doing that from what you've included.

Without prefetching friends for each user doing self.friends.all() is going to cause a query. To get around the query using prefetch you could do one of the following:

User.objects.prefetch_related('friends')

Or you could use a Prefetch object to further filter:

User.objects.prefetch_related(Prefetch(
    'friends',
    queryset=Friend.objects.filter(is_confirmed=True)
)

A Count annotation using the filter keyword argument is going to be much quicker.

from djang.db.models import Count, Q

qs = User.objects.annotate(
    friend_count=Count('friends', filter=Q(friends__is_confirmed=True)
)

答案2

得分: 1

添加 "email_addresses" 和 "from_user__email_addresses" 到 prefetch_related 并不会使页面加载更快...

这是因为 self.email_addresses.filter(is_confirmed=True).exists() 没有使用预取的 QuerySet

要使用预取的 self.email_addresses,在内存中进行筛选:

def has_confirmed_email(self):
    if self.email_addresses.all()._result_cache is not None:
        return any(email_address.is_confirmed for email_address in self.email_addresses.all())

    return (self.email_addresses.filter(is_confirmed=True).exists())

注意:如果没有进行预取,那么改进的实现仍然会在每次 has_confirmed_email 函数调用时访问数据库,因为 .filter 仍然会创建一个新的 QuerySet。为了处理这个问题,可以将 has_confirmed_email 设置为 Django 的 @cached_property

解释

来自 https://docs.djangoproject.com/en/3.0/ref/models/querysets/#prefetch-related:

记住,就像对待 QuerySets 一样,任何随后的链式方法都意味着不同的数据库查询,将忽略之前缓存的结果,并使用新的数据库查询来检索数据。...

>>> pizzas = Pizza.objects.prefetch_related('toppings')
>>> [list(pizza.toppings.filter(spicy=True)) for pizza in pizzas]

预取的缓存在这里无法帮助;事实上,它会影响性能,因为您已经执行了一个未使用的数据库查询。因此要谨慎使用这个功能!

英文:

> But adding "email_addresses" and "from_user__email_addresses" to prefetch_related doesn't make the page load faster ...

That is because self.email_addresses.filter(is_confirmed=True).exists() doesn't use the prefetched QuerySet.

To use the prefetched self.email_addresses, filter in memory:

def has_confirmed_email(self):
    if self.email_addresses.all()._result_cache is not None:
        return any(email_address.is_confirmed for email_address in self.email_addresses.all())

    return (self.email_addresses.filter(is_confirmed=True).exists())

Note: If not prefetched, then the improved implementation still hits the database on every has_confirmed_email function call since the .filter still creates a new QuerySet. To handle this, make has_confirmed_email a Django @cached_property.

Explanation

From https://docs.djangoproject.com/en/3.0/ref/models/querysets/#prefetch-related:

> Remember that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously cached results, and retrieve data using a fresh database query. ...
>
> python
> >>> pizzas = Pizza.objects.prefetch_related('toppings')
> >>> [list(pizza.toppings.filter(spicy=True)) for pizza in pizzas]
>

>
> ... The prefetched cache can’t help here; in fact it hurts performance, since you have done a database query that you haven’t used. So use this feature with caution!

答案3

得分: 0

我在用户模型中添加了一个字段:

has_confirmed_email = models.BooleanField(default=False)

还添加了以下方法:

def _update_has_confirmed_email_field(self):
    self.has_confirmed_email = (self.email_addresses.filter(is_confirmed=True).count() > 0)
    self.save_user_and_profile()

以及以下信号处理器:

@receiver(signal=models.signals.post_save, sender=UserEmailAddress)
def update_user_has_confirmed_email_field_after_saving_email_address(sender, instance: UserEmailAddress, **kwargs):
    instance.user._update_has_confirmed_email_field()

@receiver(signal=models.signals.post_delete, sender=UserEmailAddress)
def update_user_has_confirmed_email_field_after_deleting_email_address(sender, instance: UserEmailAddress, **kwargs):
    instance.user._update_has_confirmed_email_field()

在用户模型中还有以下方法:

def delete(self, *args, **kwargs):
    if ((self.is_staff) or (self.is_superuser)):
        warnings.warn('Can’t delete staff user.')
        return False
    else:
        self.email_addresses.all().delete() # 这是因为上面的信号处理器而必需的。
        return super().delete(*args, **kwargs)

我还从管理员视图中移除了好友计数,现在管理员视图页面加载大约需要1.5秒。

英文:

I added a field to the User model:

has_confirmed_email = models.BooleanField(default=False)

And the method:

def _update_has_confirmed_email_field(self):
    self.has_confirmed_email = (self.email_addresses.filter(is_confirmed=True).count() > 0)
    self.save_user_and_profile()

And:

@receiver(signal=models.signals.post_save, sender=UserEmailAddress)
def update_user_has_confirmed_email_field_after_saving_email_address(sender, instance: UserEmailAddress, **kwargs):
	instance.user._update_has_confirmed_email_field()


@receiver(signal=models.signals.post_delete, sender=UserEmailAddress)
def update_user_has_confirmed_email_field_after_deleting_email_address(sender, instance: UserEmailAddress, **kwargs):
	instance.user._update_has_confirmed_email_field()

And in the User model:

def delete(self, *args, **kwargs):
    if ((self.is_staff) or (self.is_superuser)):
        warnings.warn('Can’t delete staff user.')
        return False
    else:
        self.email_addresses.all().delete() # This is necessary because of the signal above.
        return super().delete(*args, **kwargs)

I also removed the friends count from the admin view, and now the admin view pages load in about 1.5 seconds.

huangapple
  • 本文由 发表于 2020年1月3日 19:59:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/59578263.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定