gpt4 book ai didi

python - 使用索引和更好的 SQL 在循环中优化查询集

转载 作者:行者123 更新时间:2023-11-28 20:54:59 26 4
gpt4 key购买 nike

我有一个 View 返回一些关于电子邮件列表增长的统计数据。涉及的机型有:

模型.py

class Contact(models.Model):
email_list = models.ForeignKey(EmailList, related_name='contacts')
customer = models.ForeignKey('Customer', related_name='contacts')
status = models.CharField(max_length=8)
create_date = models.DateTimeField(auto_now_add=True)


class EmailList(models.Model):
customers = models.ManyToManyField('Customer',
related_name='lists',
through='Contact')


class Customer(models.Model):
is_unsubscribed = models.BooleanField(default=False, db_index=True)
unsubscribe_date = models.DateTimeField(null=True, blank=True, db_index=True)

在 View 中,我正在做的是遍历所有 EmailLists 对象并获取一些指标:通过以下方式:

view.py

class ListHealthView(View):
def get(self, request, *args, **kwargs):
start_date, end_date = get_dates_from_querystring(request)

data = []
for email_list in EmailList.objects.all():
# historic data up to start_date
past_contacts = email_list.contacts.filter(
status='active',
create_date__lt=start_date).count()
past_unsubscribes = email_list.customers.filter(
is_unsubscribed=True,
unsubscribe_date__lt=start_date,
contacts__status='active').count()
past_deleted = email_list.contacts.filter(
status='deleted',
modify_date__lt=start_date).count()
# data for the given timeframe
new_contacts = email_list.contacts.filter(
status='active',
create_date__range=(start_date, end_date)).count()
new_unsubscribes = email_list.customers.filter(
is_unsubscribed=True,
unsubscribe_date__range=(start_date, end_date),
contacts__status='active').count()
new_deleted = email_list.contacts.filter(
status='deleted',
modify_date__range=(start_date, end_date)).count()

data.append({
'new_contacts': new_contacts,
'new_unsubscribes': new_unsubscribes,
'new_deleted': new_deleted,
'past_contacts': past_contacts,
'past_unsubscribes': past_unsubscribes,
'past_deleted': past_deleted,
})
return Response({'data': data})

现在这工作正常,但随着我的数据库开始增长,该 View 的响应时间超过 1 秒,偶尔会导致数据库中长时间运行查询。我认为最明显的改进是索引 EmailList.customers但我认为也许它需要是一个复合索引?另外,有更好的方法吗?也许使用聚合?

编辑

@bdoubleu 回答后,我尝试了以下操作:

data = (
EmailList.objects.annotate(
past_contacts=Count(Subquery(
Contact.objects.values('id').filter(
email_list=F('pk'),
status='active',
create_date__lt=start_date)
)),
past_deleted=Count(Subquery(
Contact.objects.values('id').filter(
email_list=F('pk'),
status='deleted',
modify_date__lt=start_date)
)),
)
.values(
'past_contacts', 'past_deleted',
)
)

我不得不改用 F而不是 OuterRef因为我意识到我的模型 EmailListid = HashidAutoField(primary_key=True, salt='...')造成 ProgrammingError: more than one row returned by a subquery used as an expression但我对此并不完全确定。

现在查询有效,但遗憾的是所有计数都返回为 0

最佳答案

因为您的代码正在为每个 EmailList 实例生成 6 个查询。对于 100 个实例,至少 600 个查询会减慢速度。

您可以使用 SubQuery() 进行优化表达式和 .values().

from django.db.models import Count, OuterRef, Subquery

data = (
EmailList.objects
.annotate(
past_contacts=Count(Subquery(
Contact.objects.filter(
email_list=OuterRef('pk'),
status='active',
create_date__lt=start_date
).values('id')
)),
past_unsubscribes=...,
past_deleted=...,
new_contacts=...,
new_unsubscribes=...,
new_deleted=...,
)
.values(
'past_contacts', 'past_unsubscribes',
'past_deleted', 'new_contacts',
'new_unsubscribes', 'new_deleted',
)
)

更新:对于旧版本的 Django,您的子查询可能需要如下所示

customers = (
Customer.objects
.annotate(
template_count=Subquery(
CustomerTemplate.objects
.filter(customer=OuterRef('pk'))
.values('customer')
.annotate(count=Count('*')).values('count')
)
).values('name', 'template_count')
)

关于python - 使用索引和更好的 SQL 在循环中优化查询集,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57893650/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com