gpt4 book ai didi

performance - 带 groupBy 的子查询比通过连接查询慢得多

转载 作者:行者123 更新时间:2023-11-29 13:23:24 24 4
gpt4 key购买 nike

我正在使用 sequelize 在我的 postgres 数据库上运行一些查询。由于我正在做的分页,我发现我必须使用一个子查询,并按我正在查询的主模型的主键进行分组。虽然这解决了我没有获得整页结果的问题,但查询速度要慢得多(3200 毫秒对 60 毫秒)。可悲的是,我不是 SQL 方面的专家,无法认识到我可以做些什么来加快它的速度以使其具有良好的性能。

我正在运行的 sequelize 查询是:

var query = {
limit: 10,
where: {},
include: [{model: db.FinancialCompany, through:{where:{address_zip:req.query.zip}}, required:true}, {model: db.Disclosure, required: false}],
order: [['last_name', 'ASC']],
groupBy: ['FinancialProfessional.id'],
subQuery: true
}
db.FinancialProfessional.findAndCount(
query
).then(function (professionals) {
res.jsonp(professionals);
return professionals;
})

转换为

SELECT "FinancialProfessional".*,
"FinancialCompanies"."id" AS "FinancialCompanies.id",
"FinancialCompanies"."name" AS "FinancialCompanies.name",
"FinancialCompanies"."address_street" AS "FinancialCompanies.address_street",
"FinancialCompanies"."address_city" AS "FinancialCompanies.address_city",
"FinancialCompanies"."address_state" AS "FinancialCompanies.address_state",
"FinancialCompanies"."address_zip" AS "FinancialCompanies.address_zip",
"FinancialCompanies"."crd" AS "FinancialCompanies.crd",
"FinancialCompanies"."createdAt" AS "FinancialCompanies.createdAt",
"FinancialCompanies"."updatedAt" AS "FinancialCompanies.updatedAt",
"FinancialCompanies.ProfessionalToCompany"."address_street" AS "FinancialCompanies.ProfessionalToCompany.address_street",
"FinancialCompanies.ProfessionalToCompany"."address_city" AS "FinancialCompanies.ProfessionalToCompany.address_city",
"FinancialCompanies.ProfessionalToCompany"."address_state" AS "FinancialCompanies.ProfessionalToCompany.address_state",
"FinancialCompanies.ProfessionalToCompany"."address_zip" AS "FinancialCompanies.ProfessionalToCompany.address_zip",
"FinancialCompanies.ProfessionalToCompany"."createdAt" AS "FinancialCompanies.ProfessionalToCompany.createdAt",
"FinancialCompanies.ProfessionalToCompany"."updatedAt" AS "FinancialCompanies.ProfessionalToCompany.updatedAt",
"FinancialCompanies.ProfessionalToCompany"."FinancialCompanyId" AS "FinancialCompanies.ProfessionalToCompany.FinancialCompanyId",
"FinancialCompanies.ProfessionalToCompany"."FinancialProfessionalId" AS "FinancialCompanies.ProfessionalToCompany.FinancialProfessionalId",
"Disclosures"."id" AS "Disclosures.id",
"Disclosures"."info" AS "Disclosures.info",
"Disclosures"."createdAt" AS "Disclosures.createdAt",
"Disclosures"."updatedAt" AS "Disclosures.updatedAt",
"Disclosures"."FinancialProfessionalId" AS "Disclosures.FinancialProfessionalId",
"Disclosures"."RegulatoryAgencyId" AS "Disclosures.RegulatoryAgencyId"
FROM
(SELECT "FinancialProfessional"."id",
"FinancialProfessional"."full_name",
"FinancialProfessional"."last_name",
"FinancialProfessional"."alternate_names",
"FinancialProfessional"."title",
"FinancialProfessional"."crd",
"FinancialProfessional"."licensed",
"FinancialProfessional"."display_count",
"FinancialProfessional"."years_f",
"FinancialProfessional"."years_s",
"FinancialProfessional"."createdAt",
"FinancialProfessional"."updatedAt",
"FinancialProfessional"."UserId"
FROM "FinancialProfessionals" AS "FinancialProfessional"
WHERE
(SELECT "ProfessionalToCompany"."FinancialCompanyId"
FROM "ProfessionalToCompanies" AS "ProfessionalToCompany"
INNER JOIN "FinancialCompanies" AS "FinancialCompany" ON "ProfessionalToCompany"."FinancialCompanyId" = "FinancialCompany"."id"
WHERE ("FinancialProfessional"."id" = "ProfessionalToCompany"."FinancialProfessionalId"
AND "ProfessionalToCompany"."address_zip" = '94596') LIMIT 1) IS NOT NULL
GROUP BY "FinancialProfessional"."id"
ORDER BY "FinancialProfessional"."last_name" ASC LIMIT 10) AS "FinancialProfessional"
INNER JOIN ("ProfessionalToCompanies" AS "FinancialCompanies.ProfessionalToCompany"
INNER JOIN "FinancialCompanies" AS "FinancialCompanies" ON "FinancialCompanies"."id" = "FinancialCompanies.ProfessionalToCompany"."FinancialCompanyId"
AND "FinancialCompanies.ProfessionalToCompany"."address_zip" = '94596') ON "FinancialProfessional"."id" = "FinancialCompanies.ProfessionalToCompany"."FinancialProfessionalId"
LEFT OUTER JOIN "Disclosures" AS "Disclosures" ON "FinancialProfessional"."id" = "Disclosures"."FinancialProfessionalId"
ORDER BY "FinancialProfessional"."last_name" ASC;

对查询进行分析可以得到:

Nested Loop Left Join  (cost=17155066.40..17155166.22 rows=1 width=2423) (actual time=5098.656..5098.780 rows=12 loops=1)
-> Nested Loop (cost=17155065.98..17155157.78 rows=1 width=2343) (actual time=5098.648..5098.736 rows=10 loops=1)
-> Nested Loop (cost=17155065.69..17155149.94 rows=1 width=227) (actual time=5098.642..5098.702 rows=10 loops=1)
-> Limit (cost=17155065.27..17155065.29 rows=10 width=161) (actual time=5098.618..5098.624 rows=10 loops=1)
-> Sort (cost=17155065.27..17158336.49 rows=1308489 width=161) (actual time=5098.617..5098.618 rows=10 loops=1)
Sort Key: "FinancialProfessional".last_name
Sort Method: top-N heapsort Memory: 27kB
-> Group (cost=0.43..17126789.29 rows=1308489 width=161) (actual time=10.895..5096.539 rows=909 loops=1)
Group Key: "FinancialProfessional".id
-> Index Scan using "FinancialProfessionals_pkey" on "FinancialProfessionals" "FinancialProfessional" (cost=0.43..17123518.07 rows=1308489 width=161) (actual time=10.893..5095.345 rows=909 loops=1)
Filter: ((SubPlan 1) IS NOT NULL)
Rows Removed by Filter: 1314155
SubPlan 1
-> Limit (cost=0.71..12.76 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=1315064)
-> Nested Loop (cost=0.71..12.76 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=1315064)
-> Index Scan using "ProfessionalToCompanies_pkey" on "ProfessionalToCompanies" "ProfessionalToCompany" (cost=0.42..8.45 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=1315064)
Index Cond: ("FinancialProfessional".id = "FinancialProfessionalId")
Filter: ((address_zip)::text = '94596'::text)
Rows Removed by Filter: 1
-> Index Only Scan using "FinancialCompanies_pkey" on "FinancialCompanies" "FinancialCompany" (cost=0.29..4.30 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=909)
Index Cond: (id = "ProfessionalToCompany"."FinancialCompanyId")
Heap Fetches: 0
-> Index Scan using "ProfessionalToCompanies_pkey" on "ProfessionalToCompanies" "FinancialCompanies.ProfessionalToCompany" (cost=0.42..8.45 rows=1 width=66) (actual time=0.006..0.006 rows=1 loops=10)
Index Cond: ("FinancialProfessionalId" = "FinancialProfessional".id)
Filter: ((address_zip)::text = '94596'::text)
-> Index Scan using "FinancialCompanies_pkey" on "FinancialCompanies" (cost=0.29..7.82 rows=1 width=2116) (actual time=0.002..0.002 rows=1 loops=10)
Index Cond: (id = "FinancialCompanies.ProfessionalToCompany"."FinancialCompanyId")
-> Index Scan using fp_d_id on "Disclosures" (cost=0.42..8.44 rows=1 width=80) (actual time=0.003..0.003 rows=0 loops=10)
Index Cond: ("FinancialProfessional".id = "FinancialProfessionalId")
Planning time: 0.644 ms
Execution time: 5098.873 ms

架构:

CREATE TABLE public."FinancialProfessionals"
(
id integer NOT NULL DEFAULT nextval('"FinancialProfessionals_id_seq"'::regclass),
full_name character varying(255),
last_name character varying(255),
alternate_names character varying(255)[],
title character varying(255)[],
crd integer,
licensed boolean,
"createdAt" timestamp with time zone NOT NULL,
"updatedAt" timestamp with time zone NOT NULL,
tsv tsvector,
"UserId" integer,
display_count integer DEFAULT 0,
years_f integer,
years_s integer,
CONSTRAINT "FinancialProfessionals_pkey" PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE INDEX last_name_idx
ON public."FinancialProfessionals"
USING btree
(last_name COLLATE pg_catalog."default");
CREATE INDEX name_idx
ON public."FinancialProfessionals"
USING gin
(tsv);
CREATE INDEX crd_idx
ON public."FinancialProfessionals"
USING btree
(crd);

CREATE TABLE public."ProfessionalToCompanies"
(
address_street character varying(255),
address_city character varying(255),
address_state character varying(255),
address_zip character varying(255),
"createdAt" timestamp with time zone NOT NULL,
"updatedAt" timestamp with time zone NOT NULL,
"FinancialProfessionalId" integer NOT NULL,
"FinancialCompanyId" integer NOT NULL,
CONSTRAINT "ProfessionalToCompanies_pkey" PRIMARY KEY ("FinancialProfessionalId", "FinancialCompanyId"),
CONSTRAINT "ProfessionalToCompanies_FinancialCompanyId_fkey" FOREIGN KEY ("FinancialCompanyId")
REFERENCES public."FinancialCompanies" (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT "ProfessionalToCompanies_FinancialProfessionalId_fkey" FOREIGN KEY ("FinancialProfessionalId")
REFERENCES public."FinancialProfessionals" (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE
)
WITH (
OIDS=FALSE
);

CREATE INDEX zip_idx
ON public."ProfessionalToCompanies"
USING btree
(address_zip COLLATE pg_catalog."default");

CREATE TABLE public."FinancialCompanies"
(
id integer NOT NULL DEFAULT nextval('"FinancialCompanies_id_seq"'::regclass),
name character varying(255),
address_street character varying(255),
address_city character varying(255),
address_state character varying(255),
address_zip character varying(255),
crd integer,
"createdAt" timestamp with time zone NOT NULL,
"updatedAt" timestamp with time zone NOT NULL,
company_name_tsv tsvector,
years_f integer,
CONSTRAINT "FinancialCompanies_pkey" PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE INDEX company_name_idx
ON public."FinancialCompanies"
USING gin
(company_name_tsv);

CREATE TABLE public."Disclosures"
(
id integer NOT NULL DEFAULT nextval('"Disclosures_id_seq"'::regclass),
info text,
"createdAt" timestamp with time zone NOT NULL,
"updatedAt" timestamp with time zone NOT NULL,
"FinancialProfessionalId" integer,
"RegulatoryAgencyId" integer,
CONSTRAINT "Disclosures_pkey" PRIMARY KEY (id),
CONSTRAINT "Disclosures_FinancialProfessionalId_fkey" FOREIGN KEY ("FinancialProfessionalId")
REFERENCES public."FinancialProfessionals" (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE SET NULL,
CONSTRAINT "Disclosures_RegulatoryAgencyId_fkey" FOREIGN KEY ("RegulatoryAgencyId")
REFERENCES public."RegulatoryAgencies" (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE SET NULL
)
WITH (
OIDS=FALSE
);
CREATE INDEX fp_d_id
ON public."Disclosures"
USING btree
("FinancialProfessionalId");
CREATE INDEX fp_r_id
ON public."Disclosures"
USING btree
("RegulatoryAgencyId");

FWIW 以下查询运行大约 64 毫秒

SELECT fp.full_name, array_agg(ptc), array_agg(d)
FROM
"ProfessionalToCompanies" ptc
JOIN "FinancialCompanies" fc ON ptc."FinancialCompanyId" = fc.id
JOIN "FinancialProfessionals" fp ON fp.id = ptc."FinancialProfessionalId"
LEFT OUTER JOIN "Disclosures" d ON fp.id = d."FinancialProfessionalId"
WHERE ptc.address_zip = '94596'
GROUP BY fp.id
ORDER BY fp.last_name ASC
limit 10

我是否可以添加某种索引或其他东西来使该查询高效?

最佳答案

明显的索引候选者将在您的订购标准中。这样,PostgreSQL 可能会按顺序对索引进行嵌套循环,直到满足限制条件。这肯定会有很大帮助。

但是要小心。如果由于其他条件必须跳过许多记录,这样的索引可能会表现得更差。

编辑

在看到解释分析部分时,令我震惊的是您看到子查询的嵌套循环在大多数情况下未检索任何结果,并且运行了 130 万次。这实际上占了您报告分组时间的大部分时间。实际排序非常快,因为此时几乎没有行。也许尝试按顺序对 last_name 和 id 进行索引?

在这一点上我不完全确定。还要检查您的 GEQO 设置。

EDIT2

我阅读分析结果时遇到的问题是,您被迫在 where 子句中使用的子查询内进行聚合。这可以解释为什么使用 subQuery 会对性能产生负面影响。

那么你就有了限制,这让 PostgreSQL 认为“嘿,我可以在这里做一个嵌套循环,而且可能会更快,因为我可以在找到 10 行后停止”,但是当它通过嵌套循环时,它永远找不到任何行,所以结果证明这是一个非常糟糕的计划。

我看不到在没有其他层的情况下通过 ORM 优化它的简单方法。

关于performance - 带 groupBy 的子查询比通过连接查询慢得多,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37866884/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com