gpt4 book ai didi

sql - 如何调整表包含 30,000 多行的 7 表连接 MySQL 计数查询?

转载 作者:可可西里 更新时间:2023-11-01 06:29:18 24 4
gpt4 key购买 nike

我有一个 sql 查询,用于计算复杂查询的结果数。当限制为 20 个结果时,实际的选择查询非常快,但经过大量优化后,计数版本在我当前的表上大约需要 4.5 秒。

如果我删除网站标签和画廊标签上的两个连接和 where 子句,查询将在 1.5 秒内执行。如果我创建 3 个单独的查询 - 一个选择付费网站,一个选择名称,一个将所有内容放在一起 - 我可以将查询缩短到 0.6 秒,这仍然不够好。这也将迫使我使用存储过程,因为我将不得不在 Hibernate 中进行总共 4 次查询。

对于“原样”查询,这里有一些信息:

Handler_read_key 为 1746669
Handler_read_next 为 1546324

gallery 表有 40,000 行
站点表有 900 行
名称表有 800 行
标签表有3560行

我对 MySQL 和调优还很陌生,我在以下方面有索引:

  • 标签表中的'term'列
  • 图库表中的“已发布”列
  • 名称表的“值”

我希望将此查询缩短到 0.1 毫秒。

SELECT count(distinct gallery.id)
from gallery gallery
inner join
site site
on gallery.site_id = site.id
inner join
site_to_tag p2t
on site.id = p2t.site_id
inner join
tag site_tag
on p2t.tag_id = site_tag.id
inner join
gallery_to_name g2mn
on gallery.id = g2mn.gallery_id
inner join
name name
on g2mn.name_id = name.id
inner join
gallery_to_tag g2t
on gallery.id = g2t.gallery_id
inner join
tag tag
on g2t.tag_id = tag.id
where
gallery.published = true and (
name.value LIKE 'sometext%' or
tag.term = 'sometext' or
site.`name` like 'sometext%' or
site_tag.term = 'sometext'
)

解释数据:

| id | select_type | table        | type   | possible_keys                                                     | key                | key_len | ref                                       | rows | Extra                              |
+----+-------------+--------------+--------+-------------------------------------------------------------------+--------------------+---------+-------------------------------------------+------+------------------------------------+
| 1 | SIMPLE | site | index | PRIMARY,nameIndex | nameIndex | 258 | NULL | 950 | Using index; Using temporary |
| 1 | SIMPLE | gallery | ref | PRIMARY,publishedIndex,FKF44C775296EECE37,publishedSiteIdIndex | FKF44C775296EECE37 | 9 | production.site.id | 20 | Using where |
| 1 | SIMPLE | g2mn | ref | PRIMARY,FK3EFFD7F8AFAD7A5E,FK3EFFD7F832C04188 | FK3EFFD7F8AFAD7A5E | 8 | production.gallery.id | 1 | Using index; Distinct |
| 1 | SIMPLE | name | eq_ref | PRIMARY,valueIndex | PRIMARY | 8 | production.g2mn.name_id | 1 | Distinct |
| 1 | SIMPLE | g2t | ref | PRIMARY,FK3DDB4D63AFAD7A5E,FK3DDB4D63E210FBA6 | FK3DDB4D63AFAD7A5E | 8 | production.g2mn.gallery_id | 2 | Using where; Using index; Distinct |
| 1 | SIMPLE | tag | eq_ref | PRIMARY,termIndex | PRIMARY | 8 | production.g2t.tag_id | 1 | Distinct |
| 1 | SIMPLE | p2t | ref | PRIMARY,FK29424AB796EECE37,FK29424AB7E210FBA6 | PRIMARY | 8 | production.gallery.site_id | 3 | Using where; Using index; Distinct |
| 1 | SIMPLE | site_tag | eq_ref | PRIMARY,termIndex | PRIMARY | 8 | production.p2t.tag_id | 1 | Using where; Distinct |
+----+-------------+--------------+--------+-------------------------------------------------------------------+--------------------+---------+-------------------------------------------+------+------------------------------------+

个人计数速度:

[SQL] select count(*) from gallery;
Affected rows: 0
Time: 0.014ms
Results: 40385

[SQL]
select count(*) from gallery_to_name;
Affected rows: 0
Time: 0.012ms
Results: 35615

[SQL]
select count(*) from gallery_to_tag;
Affected rows: 0
Time: 0.055ms
Results: 165104

[SQL]
select count(*) from tag;
Affected rows: 0
Time: 0.002ms
Results: 3560

[SQL]
select count(*) from site;
Affected rows: 0
Time: 0.001ms
Results: 901

[SQL]
select count(*) from site_to_tag;
Affected rows: 0
Time: 0.003ms
Results: 7026

最佳答案

我在本文末尾包含了我的测试架构和用于生成测试数据的脚本。我用过 SQL_NO_CACHE防止 MySQL 缓存查询结果的选项 - 这仅用于测试,最终应该被删除。

这个和Donnie提出的思路差不多,不过我稍微整理了一下。如果我正确理解了联接,则无需在每个选择中重复所有联接,因为每个联接实际上都独立于其他联接。原始的 WHERE 子句规定 gallery.published 必须为真,然后是一系列由 OR 连接的 4 个条件。因此,每个查询都可以单独执行。以下是四个连接:

gallery <--> gallery_to_name <--> name
gallery <--> gallery_to_tag <--> tag
gallery <--> site
gallery <--> site <--> site_to_tag <--> tag

因为 gallery 包含 site_id,在这种情况下,不需要通过 site 表进行中间连接。因此,最后一个连接可以简化为:

gallery <--> site_to_tag <--> tag

分别运行每个 SELECT,并使用 UNION 组合结果,速度非常快。这里的结果假定表结构和索引显示在本文末尾:

SELECT SQL_NO_CACHE COUNT(id) AS matches FROM (
(SELECT g.id
FROM gallery AS g
INNER JOIN site AS s ON s.id = g.site_id
WHERE g.published = TRUE AND s.name LIKE '3GRD%')
UNION
(SELECT g.id
FROM gallery AS g
INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
INNER JOIN name AS n ON n.id = g2n.name_id
WHERE g.published = TRUE AND n.value LIKE '3GRD%')
UNION
(SELECT g.id
FROM gallery AS g
INNER JOIN gallery_to_tag AS g2t ON g2t.gallery_id = g.id
INNER JOIN tag AS gt ON gt.id = g2t.tag_id
WHERE g.published = TRUE AND gt.term = '3GRD')
UNION
(SELECT g.id
FROM gallery AS g
INNER JOIN site_to_tag AS s2t ON s2t.site_id = g.site_id
INNER JOIN tag AS st ON st.id = s2t.tag_id
WHERE g.published = TRUE AND st.term = '3GRD')
) AS totals;

+---------+
| matches |
+---------+
| 99 |
+---------+
1 row in set (0.00 sec)

速度确实因搜索条件而异。在以下示例中,每个表使用了不同的搜索值,LIKE 运算符必须做更多的工作,因为现在每个表都有更多的潜在匹配项:

SELECT SQL_NO_CACHE COUNT(id) AS matches FROM (
(SELECT g.id
FROM gallery AS g
INNER JOIN site AS s ON s.id = g.site_id
WHERE g.published = TRUE AND s.name LIKE '3H%')
UNION
(SELECT g.id
FROM gallery AS g
INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
INNER JOIN name AS n ON n.id = g2n.name_id
WHERE g.published = TRUE AND n.value LIKE '3G%')
UNION
(SELECT g.id
FROM gallery AS g
INNER JOIN gallery_to_tag AS g2t ON g2t.gallery_id = g.id
INNER JOIN tag AS gt ON gt.id = g2t.tag_id
WHERE g.published = TRUE AND gt.term = '3IDP')
UNION
(SELECT g.id
FROM gallery AS g
INNER JOIN site_to_tag AS s2t ON s2t.site_id = g.site_id
INNER JOIN tag AS st ON st.id = s2t.tag_id
WHERE g.published = TRUE AND st.term = '3OJX')
) AS totals;

+---------+
| matches |
+---------+
| 12505 |
+---------+
1 row in set (0.24 sec)

这些结果优于使用多个连接的查询:

SELECT SQL_NO_CACHE COUNT(DISTINCT g.id) AS matches
FROM gallery AS g
INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
INNER JOIN name AS n ON n.id = g2n.name_id
INNER JOIN gallery_to_tag AS g2t ON g2t.gallery_id = g.id
INNER JOIN tag AS gt ON gt.id = g2t.tag_id
INNER JOIN site AS s ON s.id = g.site_id
INNER JOIN site_to_tag AS s2t ON s2t.site_id = s.id
INNER JOIN tag AS st ON st.id = s2t.tag_id
WHERE g.published = TRUE AND (
gt.term = '3GRD' OR
st.term = '3GRD' OR
n.value LIKE '3GRD%' OR
s.name LIKE '3GRD%');

+---------+
| matches |
+---------+
| 99 |
+---------+
1 row in set (2.62 sec)

SELECT SQL_NO_CACHE COUNT(DISTINCT g.id) AS matches
FROM gallery AS g
INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
INNER JOIN name AS n ON n.id = g2n.name_id
INNER JOIN gallery_to_tag AS g2t ON g2t.gallery_id = g.id
INNER JOIN tag AS gt ON gt.id = g2t.tag_id
INNER JOIN site AS s ON s.id = g.site_id
INNER JOIN site_to_tag AS s2t ON s2t.site_id = s.id
INNER JOIN tag AS st ON st.id = s2t.tag_id
WHERE g.published = TRUE AND (
gt.term = '3IDP' OR
st.term = '3OJX' OR
n.value LIKE '3G%' OR
s.name LIKE '3H%');

+---------+
| matches |
+---------+
| 12505 |
+---------+
1 row in set (3.17 sec)

架构
id 列的索引加上 site.namename.valuetag.term 很重要:

DROP SCHEMA IF EXISTS `egervari`;
CREATE SCHEMA IF NOT EXISTS `egervari`;
USE `egervari`;

-- -----------------------------------------------------
-- Table `site`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `site` ;
CREATE TABLE IF NOT EXISTS `site` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`name` VARCHAR(255) NOT NULL ,
INDEX `name` (`name` ASC) ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `gallery`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `gallery` ;
CREATE TABLE IF NOT EXISTS `gallery` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`site_id` INT UNSIGNED NOT NULL ,
`published` TINYINT(1) NOT NULL DEFAULT 0 ,
PRIMARY KEY (`id`) ,
INDEX `fk_gallery_site` (`site_id` ASC) ,
CONSTRAINT `fk_gallery_site`
FOREIGN KEY (`site_id` )
REFERENCES `site` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE)
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `name`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `name` ;
CREATE TABLE IF NOT EXISTS `name` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`value` VARCHAR(255) NOT NULL ,
INDEX `value` (`value` ASC) ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `tag`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `tag` ;
CREATE TABLE IF NOT EXISTS `tag` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`term` VARCHAR(255) NOT NULL ,
INDEX `term` (`term` ASC) ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `gallery_to_name`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `gallery_to_name` ;
CREATE TABLE IF NOT EXISTS `gallery_to_name` (
`gallery_id` INT UNSIGNED NOT NULL ,
`name_id` INT UNSIGNED NOT NULL ,
PRIMARY KEY (`gallery_id`, `name_id`) ,
INDEX `fk_gallery_to_name_gallery` (`gallery_id` ASC) ,
INDEX `fk_gallery_to_name_name` (`name_id` ASC) ,
CONSTRAINT `fk_gallery_to_name_gallery`
FOREIGN KEY (`gallery_id` )
REFERENCES `gallery` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE,
CONSTRAINT `fk_gallery_to_name_name`
FOREIGN KEY (`name_id` )
REFERENCES `name` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE)
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `gallery_to_tag`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `gallery_to_tag` ;
CREATE TABLE IF NOT EXISTS `gallery_to_tag` (
`gallery_id` INT UNSIGNED NOT NULL ,
`tag_id` INT UNSIGNED NOT NULL ,
PRIMARY KEY (`gallery_id`, `tag_id`) ,
INDEX `fk_gallery_to_tag_gallery` (`gallery_id` ASC) ,
INDEX `fk_gallery_to_tag_tag` (`tag_id` ASC) ,
CONSTRAINT `fk_gallery_to_tag_gallery`
FOREIGN KEY (`gallery_id` )
REFERENCES `gallery` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE,
CONSTRAINT `fk_gallery_to_tag_tag`
FOREIGN KEY (`tag_id` )
REFERENCES `tag` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE)
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `site_to_tag`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `site_to_tag` ;
CREATE TABLE IF NOT EXISTS `site_to_tag` (
`site_id` INT UNSIGNED NOT NULL ,
`tag_id` INT UNSIGNED NOT NULL ,
PRIMARY KEY (`site_id`, `tag_id`) ,
INDEX `fk_site_to_tag_site` (`site_id` ASC) ,
INDEX `fk_site_to_tag_tag` (`tag_id` ASC) ,
CONSTRAINT `fk_site_to_tag_site`
FOREIGN KEY (`site_id` )
REFERENCES `site` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE,
CONSTRAINT `fk_site_to_tag_tag`
FOREIGN KEY (`tag_id` )
REFERENCES `tag` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE)
ENGINE = InnoDB;

测试数据
这会填充 site 900 行,tag 3560 行,name 800 行,gallery 40,000 行,并将条目插入链接表:

DELIMITER //
DROP PROCEDURE IF EXISTS populate//
CREATE PROCEDURE populate()
BEGIN
DECLARE i INT DEFAULT 0;

WHILE i < 900 DO
INSERT INTO site (name) VALUES (CONV(i + 1 * 10000, 20, 36));
SET i = i + 1;
END WHILE;

SET i = 0;
WHILE i < 3560 DO
INSERT INTO tag (term) VALUES (CONV(i + 1 * 10000, 20, 36));
INSERT INTO site_to_tag (site_id, tag_id) VALUES ( (i MOD 900) + 1, i + 1 );
SET i = i + 1;
END WHILE;

SET i = 0;
WHILE i < 800 DO
INSERT INTO name (value) VALUES (CONV(i + 1 * 10000, 20, 36));
SET i = i + 1;
END WHILE;

SET i = 0;
WHILE i < 40000 DO
INSERT INTO gallery (site_id, published) VALUES ( (i MOD 900) + 1, i MOD 2 );
INSERT INTO gallery_to_name (gallery_id, name_id) VALUES ( i + 1, (i MOD 800) + 1 );
INSERT INTO gallery_to_tag (gallery_id, tag_id) VALUES ( i + 1, (i MOD 3560) + 1 );
SET i = i + 1;
END WHILE;
END;
//
DELIMITER ;
CALL populate();

关于sql - 如何调整表包含 30,000 多行的 7 表连接 MySQL 计数查询?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3151410/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com