gpt4 book ai didi

sql - PostgreSQL distinct on + order by 查询优化

转载 作者:行者123 更新时间:2023-11-29 11:36:36 26 4
gpt4 key购买 nike

我在查询时遇到了一个小问题。

SELECT DISTINCT ON ("reporting_processedamazonsnapshot"."offer_id") *
FROM "reporting_processedamazonsnapshot" INNER JOIN
"offers_boooffer"
ON ("reporting_processedamazonsnapshot"."offer_id" =
"offers_boooffer"."id") INNER JOIN
"offers_offersettings"
ON ("offers_boooffer"."id" = "offers_offersettings"."offer_id")
WHERE "offers_offersettings"."account_id" = 20
ORDER BY "reporting_processedamazonsnapshot"."offer_id" ASC,
"reporting_processedamazonsnapshot"."scraping_date" DESC

我在 offer_id ASC, scraping_date DESC 上有一个名为 latest_scraping 的索引,但出于某种原因,PostgreSQL 在使用该索引后仍在进行排序,这导致了巨大的性能问题。

我不明白为什么它不使用已经排序的数据而不是重做排序。我的索引错了吗?或者我应该尝试以其他方式进行查询吗?

下面是解释 enter image description here及其实际数据

'Unique  (cost=21260.47..21263.06 rows=519 width=1288) (actual time=38053.685..38177.348 rows=1783 loops=1)'
' -> Sort (cost=21260.47..21261.76 rows=519 width=1288) (actual time=38053.683..38161.478 rows=153095 loops=1)'
' Sort Key: reporting_processedamazonsnapshot.offer_id, reporting_processedamazonsnapshot.scraping_date DESC'
' Sort Method: external merge Disk: 162088kB'
' -> Nested Loop (cost=41.90..21237.06 rows=519 width=1288) (actual time=70.874..36148.348 rows=153095 loops=1)'
' -> Nested Loop (cost=41.47..17547.90 rows=1627 width=8) (actual time=54.287..126.740 rows=1784 loops=1)'
' -> Bitmap Heap Scan on offers_offersettings (cost=41.04..4823.48 rows=1627 width=4) (actual time=52.532..84.102 rows=1784 loops=1)'
' Recheck Cond: (account_id = 20)'
' Heap Blocks: exact=38'
' -> Bitmap Index Scan on offers_offersettings_account_id_fff7a8c0 (cost=0.00..40.63 rows=1627 width=0) (actual time=49.886..49.886 rows=4132 loops=1)'
' Index Cond: (account_id = 20)'
' -> Index Only Scan using offers_boooffer_pkey on offers_boooffer (cost=0.43..7.81 rows=1 width=4) (actual time=0.019..0.020 rows=1 loops=1784)'
' Index Cond: (id = offers_offersettings.offer_id)'
' Heap Fetches: 1784'
' -> Index Scan using latest_scraping on reporting_processedamazonsnapshot (cost=0.43..1.69 rows=58 width=1288) (actual time=0.526..20.146 rows=86 loops=1784)'
' Index Cond: (offer_id = offers_boooffer.id)'
'Planning time: 187.133 ms'
'Execution time: 38195.266 ms'

最佳答案

要使用索引来避免排序,PostgreSQL 首先必须按索引顺序扫描所有 "reporting_processedamazonsnapshot",然后加入所有 strong> "offers_boooffer" 使用嵌套循环连接(以便保留顺序)然后连接所有 "offers_offersettings",再次使用嵌套循环连接

最后,所有不符合条件 "offers_offersettings"."account_id"= 20 的行都将被丢弃。

PostgreSQL 相信——在我看来是正确的——首先使用条件尽可能减少行数,然后使用最有效的连接方法连接表,然后对 DISTINCT 子句。

我想知道下面的查询是否会更快:

SELECT DISTINCT ON (q.offer_id) *
FROM offers_offersettings ofs
JOIN offers_boooffer bo ON bo.id = ofs.offer_id
CROSS JOIN LATERAL
(SELECT *
FROM reporting_processedamazonsnapshot r
WHERE r.offer_id = bo.offer_id
ORDER BY r.scraping_date DESC
LIMIT 1) q
WHERE ofs.account_id = 20
ORDER BY q.offer_id ASC, q.scraping_date DESC;

执行计划将是相似的,除了必须从索引中扫描更少的行,这应该会减少您最需要的执行时间。

如果您想加快排序速度,请将该查询的 work_mem 增加到大约 500MB(如果您负担得起的话)。

关于sql - PostgreSQL distinct on + order by 查询优化,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51623084/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com