gpt4 book ai didi

performance - PostgreSQL Simple JOIN 很慢

转载 作者:行者123 更新时间:2023-11-29 11:14:50 25 4
gpt4 key购买 nike

我有一个简单的查询和两个表:

向下钻取

CREATE SEQUENCE drilldown_id_seq;

CREATE TABLE drilldown (
transactionid bigint NOT NULL DEFAULT nextval('drilldown_id_seq'),
userid bigint NOT NULL default 0 REFERENCES users(id),
pathid bigint NOT NULL default 0,
reqms bigint NOT NULL default 0,
quems bigint NOT NULL default 0,
clicktime timestamp default current_timestamp,
PRIMARY KEY(transactionid)
);

ALTER SEQUENCE drilldown_id_seq OWNED BY drilldown.transactionid;

CREATE INDEX drilldown_idx1 ON drilldown (clicktime);

querystats

CREATE SEQUENCE querystats_id_seq;

CREATE TABLE querystats (
id bigint NOT NULL DEFAULT nextval('querystats_id_seq'),
transactionid bigint NOT NULL default 0 REFERENCES drilldown(transactionid),
querynameid bigint NOT NULL default 0 REFERENCES queryname(id),
queryms bigint NOT NULL default 0,
PRIMARY KEY(id)
);

ALTER SEQUENCE querystats_id_seq OWNED BY querystats.id;

CREATE INDEX querystats_idx1 ON querystats (transactionid);
CREATE INDEX querystats_idx2 ON querystats (querynameid);

drilldown 有 150 万条记录,querystats 有 1000 万条记录;当我在两者之间加入时,问题就发生了。

查询

explain analyse
select avg(qs.queryms)
from querystats qs
join drilldown d on (qs.transactionid=d.transactionid)
where querynameid=1;

查询计划

Aggregate  (cost=528596.96..528596.97 rows=1 width=8) (actual time=5213.154..5213.154 rows=1 loops=1)   ->  Hash Join  (cost=274072.53..518367.59 rows=4091746 width=8) (actual time=844.087..3528.788 rows=4117717 loops=1)         Hash Cond: (qs.transactionid = d.transactionid)         ->  Bitmap Heap Scan on querystats qs  (cost=88732.62..210990.44 rows=4091746 width=16) (actual time=309.502..1321.029 rows=4117717 loops=1)               Recheck Cond: (querynameid = 1)               ->  Bitmap Index Scan on querystats_idx2  (cost=0.00..87709.68 rows=4091746 width=0) (actual time=307.916..307.916 rows=4117718 loops=1)                     Index Cond: (querynameid = 1)         ->  Hash  (cost=162842.29..162842.29 rows=1371250 width=8) (actual time=534.065..534.065 rows=1372574 loops=1)               Buckets: 4096  Batches: 64  Memory Usage: 850kB               ->  Index Scan using drilldown_pkey on drilldown d  (cost=0.00..162842.29 rows=1371250 width=8) (actual time=0.015..364.657 rows=1372574 loops=1) Total runtime: 5213.205 ms(11 rows)

I know there are some tuning parameters I can adjust for PostgreSQL, but what I want to know is the query I am doing the most optimal way of joing the two tables?

Or maybe some sort of INNER JOIN? I'm just not sure.

Any pointers are appreciated!

EDIT

database#\d drilldown
Table "public.drilldown"
Column | Type | Modifiers
---------------+-----------------------------+--------------------------------------------------------
transactionid | bigint | not null default nextval('drilldown_id_seq'::regclass)
userid | bigint | not null default 0
pathid | bigint | not null default 0
reqms | bigint | not null default 0
quems | bigint | not null default 0
clicktime | timestamp without time zone | default now()
Indexes:
"drilldown_pkey" PRIMARY KEY, btree (transactionid)
"drilldown_idx1" btree (clicktime)
Foreign-key constraints:
"drilldown_userid_fkey" FOREIGN KEY (userid) REFERENCES users(id)
Referenced by:
TABLE "querystats" CONSTRAINT "querystats_transactionid_fkey" FOREIGN KEY (transactionid) REFERENCES drilldown(transactionid)

database=# \d querystats
Table "public.querystats"
Column | Type | Modifiers
---------------+--------+---------------------------------------------------------
id | bigint | not null default nextval('querystats_id_seq'::regclass)
transactionid | bigint | not null default 0
querynameid | bigint | not null default 0
queryms | bigint | not null default 0
Indexes:
"querystats_pkey" PRIMARY KEY, btree (id)
"querystats_idx1" btree (transactionid)
"querystats_idx2" btree (querynameid)
Foreign-key constraints:
"querystats_querynameid_fkey" FOREIGN KEY (querynameid) REFERENCES queryname(id)
"querystats_transactionid_fkey" FOREIGN KEY (transactionid) REFERENCES drilldown(transactionid)

所以这是请求的两个表和版本

PostgreSQL 9.1.7 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

所以这个查询所做的是从每个查询类型 (querynameid) 的 queryms 的所有行值中获取平均值

            name            |         current_setting          |        source        
----------------------------+----------------------------------+----------------------
application_name | psql | client
client_encoding | UTF8 | client
DateStyle | ISO, MDY | configuration file
default_text_search_config | pg_catalog.english | configuration file
enable_seqscan | off | session
external_pid_file | /var/run/postgresql/9.1-main.pid | configuration file
lc_messages | en_US.UTF-8 | configuration file
lc_monetary | en_US.UTF-8 | configuration file
lc_numeric | en_US.UTF-8 | configuration file
lc_time | en_US.UTF-8 | configuration file
log_line_prefix | %t | configuration file
log_timezone | localtime | environment variable
max_connections | 100 | configuration file
max_stack_depth | 2MB | environment variable
port | 5432 | configuration file
shared_buffers | 24MB | configuration file
ssl | on | configuration file
TimeZone | localtime | environment variable
unix_socket_directory | /var/run/postgresql | configuration file
(19 rows)

我看到 enable_seqscan=off,我没有触及任何设置,这是一个完全默认的安装。

更新

我根据以下评论做了一些修改,结果如下。

explain analyse SELECT (SELECT avg(queryms) AS total FROM querystats WHERE querynameid=3) as total FROM querystats qs JOIN drilldown d ON (qs.transactionid=d.transactionid) WHERE qs.querynameid=3 limit 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=196775.99..196776.37 rows=1 width=0) (actual time=2320.876..2320.876 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Aggregate (cost=196775.94..196775.99 rows=1 width=8) (actual time=2320.815..2320.815 rows=1 loops=1)
-> Bitmap Heap Scan on querystats (cost=24354.25..189291.69 rows=2993698 width=8) (actual time=226.516..1144.690 rows=2999798 loops=1)
Recheck Cond: (querynameid = 3)
-> Bitmap Index Scan on querystats_idx (cost=0.00..23605.83 rows=2993698 width=0) (actual time=225.119..225.119 rows=2999798 loops=1)
Index Cond: (querynameid = 3)
-> Nested Loop (cost=0.00..1127817.12 rows=2993698 width=0) (actual time=2320.876..2320.876 rows=1 loops=1)
-> Seq Scan on drilldown d (cost=0.00..76745.10 rows=1498798 width=8) (actual time=0.009..0.009 rows=1 loops=1)
-> Index Scan using querystats_idx on querystats qs (cost=0.00..0.60 rows=2 width=8) (actual time=0.045..0.045 rows=1 loops=1)
Index Cond: ((querynameid = 3) AND (transactionid = d.transactionid))
Total runtime: 2320.940 ms
(12 rows)

最佳答案

它的行为就像您设置了 enable_seqscan = off 一样,因为它使用索引扫描来填充哈希表。除非作为诊断步骤,否则切勿关闭任何计划程序选项,如果您要显示计划,请显示使用的任何选项。这可以运行以显示很多有用的信息:

SELECT version();
SELECT name, current_setting(name), source
FROM pg_settings
WHERE source NOT IN ('default', 'override');

如果您告诉我们有关运行时环境的信息,尤其是机器上的 RAM 容量、存储系统的外观以及数据库的大小(或者更好,事件数据集 数据库中经常引用的数据)。

作为粗略的分解,5.2 秒分解为:

  1. 1.3 秒找到符合您的选择标准的 4,117,717 querystats 行。
  2. 2.3 秒随机匹配那些与 drilldown 记录。
  3. 1.6 秒通过 4,117,717 行并计算平均值。

因此,即使您似乎削弱了它使用最快计划的能力,它只需要 1.26 微秒(百万分之一秒)来定位每一行,将其连接到另一行,并将其用于计算平均的。这在绝对基础上并不算太糟糕,但您几乎可以肯定会得到一个稍微更快的计划。

首先,如果您使用的是 9.2.x,其中 x 小于 3,请立即升级到 9.2.3。某些类型的计划存在性能退化,已在最近的版本中修复,这可能会影响此查询。一般来说,try to stay up-to-date on minor releases (版本号更改超过第二个点)。

您可以在单个 session 中测试不同的计划,方法是仅在该连接上设置计划因素并运行您的查询(或对其执行 EXPLAIN)。尝试这样的事情:

SET seq_page_cost = 0.1;
SET random_page_cost = 0.1;
SET cpu_tuple_cost = 0.05;
SET effective_cache_size = '3GB'; -- actually use shared_buffers plus OS cache

确保所有enable_ 设置都on

关于performance - PostgreSQL Simple JOIN 很慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14793708/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com