gpt4 book ai didi

postgresql - Postgres HashAggregate 运行缓慢

转载 作者:行者123 更新时间:2023-11-29 13:16:11 25 4
gpt4 key购买 nike

下面的查询在一个非常大的表(300MM 行)上运行,运行时间不到半秒。

SELECT  ld.user_profile_id,
CASE WHEN ld.first_name IS NULL THEN '' ELSE UPPER(first_name) END as first_name,
CASE WHEN ld.last_name IS NULL THEN '' ELSE UPPER(ld.last_name) END as last_name
FROM data ld
WHERE ld.hashed_ssn = 'xxxx'

UNION

SELECT ld.user_profile_id,
CASE WHEN ld.first_name IS NULL THEN '' ELSE UPPER(first_name) END as first_name,
CASE WHEN ld.last_name IS NULL THEN '' ELSE UPPER(ld.last_name) END as last_name
FROM data ld
WHERE ld.email = '123@GMAIL.COM'

UNION

SELECT ld.user_profile_id,
CASE WHEN ld.first_name IS NULL THEN '' ELSE UPPER(first_name) END as first_name,
CASE WHEN ld.last_name IS NULL THEN '' ELSE UPPER(ld.last_name) END as last_name
FROM data ld
WHERE ld.home_phone = '5512345678';

执行计划如下:

HashAggregate  (cost=11774.31..11802.68 rows=2837 width=110) (actual time=404.978..405.003 rows=12 loops=1)
" Group Key: ld.user_profile_id, (CASE WHEN (ld.first_name IS NULL) THEN ''::text ELSE upper((ld.first_name)::text) END), (CASE WHEN (ld.last_name IS NULL) THEN ''::text ELSE upper((ld.last_name)::text) END)"
-> Append (cost=0.70..11696.29 rows=2837 width=110) (actual time=3.935..402.438 rows=1131 loops=1)
-> Index Scan using hashed_ssn_index_data on data ld (cost=0.70..4425.12 rows=1074 width=110) (actual time=3.935..353.560 rows=376 loops=1)
Index Cond: ((hashed_ssn)::text = 'XXXX'::text)
-> Index Scan using email_index_data on data ld_1 (cost=0.57..3349.39 rows=814 width=110) (actual time=3.834..30.938 rows=379 loops=1)
Index Cond: ((email)::text = '123@GMAIL.COM'::text)
-> Index Scan using home_phone_index_data on data ld_2 (cost=0.57..3893.41 rows=949 width=110) (actual time=3.429..17.663 rows=376 loops=1)
Index Cond: ((home_phone)::text = '55512345678'::text)
Planning time: 55.073 ms
Execution time: 405.311 ms

我确实为 hashed_ssn、电子邮件和电话添加了索引,但是我仍然没有达到我正在寻找的速度。这仅仅是因为 table 太大了吗?关于如何加快此查询的任何建议?

提前谢谢你。

最佳答案

您正在检索大约 400 行,它们随机位于磁盘上。这可能需要大约 1000 次 IO 操作,其速度将取决于您的存储可以提供多少 IOPS。

如果您的表不经常更改,您可以强制它存储相同 hashed_ssn 的数据行彼此靠近。这样查询就不必从存储中检索那么多随机 block 。

alter table data cluster on hashed_ssn_index_data;
cluster data;

这将花费大量时间,甚至会阻塞表以供读取。数据库不会尝试在插入/更新时保留此顺序或行,因此如果您要更改数据,性能会逐渐降低。但是您可以定期重新运行 cluster data;

关于postgresql - Postgres HashAggregate 运行缓慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48632021/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com