gpt4 book ai didi

sql - 我怎样才能通过对大表的查询来加快这个计数+分组的速度?

转载 作者:行者123 更新时间:2023-11-29 12:16:54 25 4
gpt4 key购买 nike

我有一个 Postgres 表,其中包含用户访问我们网站时 Segment.com 创建的 anonymous_id(字符串)和 timestamp(日期时间)列。

有 ~5M 行,~1M 不同的 anonymous_id

我想查询每个月找到的不同 anonymous_id 的数量。

到目前为止我有这个,它可以工作,但是在 PSequel 中超时(我可以运行它几次并限制日期)

SELECT count(1), "month"
FROM (
SELECT DISTINCT anonymous_id,
date_trunc('month', "timestamp") as "month"
FROM pages
-- WHERE "timestamp" between '2018-01-01' and '2018-02-01'
) as dt
GROUP BY 2
ORDER BY 2

我在 anonymous_id 和 timestamp 上都有一个索引

EXPLAIN ANALYSE的结果

                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=1667977.72..1667978.22 rows=200 width=8) (actual time=115861.803..115861.807 rows=27 loops=1)
Sort Key: (date_trunc('month'::text, pages."timestamp"))
Sort Method: quicksort Memory: 26kB
-> HashAggregate (cost=1667968.07..1667970.07 rows=200 width=8) (actual time=115861.763..115861.766 rows=27 loops=1)
Group Key: (date_trunc('month'::text, pages."timestamp"))
-> Unique (cost=1554502.82..1592324.57 rows=5042900 width=45) (actual time=97492.062..115468.396 rows=1158934 loops=1)
-> Sort (cost=1554502.82..1567110.07 rows=5042900 width=45) (actual time=97492.060..113983.496 rows=5042900 loops=1)
Sort Key: pages.anonymous_id, (date_trunc('month'::text, pages."timestamp"))
Sort Method: external merge Disk: 285936kB
-> Seq Scan on pages (cost=0.00..682820.25 rows=5042900 width=45) (actual time=0.088..25601.944 rows=5042900 loops=1)
Planning time: 10.335 ms
Execution time: 115910.353 ms
(12 rows)

当前索引(包括下面 Thorsten Kettner 建议的组合索引)

Indexes:
"pages_pkey" PRIMARY KEY, btree (id)
"idx_anonymous_id" btree (anonymous_id)
"idx_date_trunc_anon_id" btree (date_trunc('month'::text, timezone('UTC'::text, "timestamp")), anonymous_id)
"idx_path" btree (path)
"idx_timestamp" btree ("timestamp")
"idx_url" btree (url)
"idx_user_id" btree (user_id)
"pages_activity_type_idx" btree (activity_type)

最佳答案

我唯一能想到的就是摆脱派生表,因为您不需要它:

SELECT count(distinct anonymous_id), date_trunc('month', "timestamp") AS "month"
FROM pages
GROUP BY date_trunc('month', "timestamp")
ORDER BY date_trunc('month', "timestamp");

关于sql - 我怎样才能通过对大表的查询来加快这个计数+分组的速度?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48584189/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com