gpt4 book ai didi

mysql - 不同与分组依据

转载 作者:IT王子 更新时间:2023-10-28 23:52:29 25 4
gpt4 key购买 nike

我有两个这样的表。'order' 表有 21886 行。

CREATE TABLE `order` (
`id` bigint(20) unsigned NOT NULL,
`reg_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_reg_date` (`reg_date`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci


CREATE TABLE `order_detail_products` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`order_id` bigint(20) unsigned NOT NULL,
`order_detail_id` int(11) NOT NULL,
`prod_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_order_detail_id` (`order_detail_id`,`prod_id`),
KEY `idx_order_id` (`order_id`,`order_detail_id`,`prod_id`)
) ENGINE=InnoDB AUTO_INCREMENT=572375 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

我的问题在这里。

MariaDB [test]> explain
-> SELECT DISTINCT A.id
-> FROM order A
-> JOIN order_detail_products B ON A.id = B.order_id
-> ORDER BY A.reg_date DESC LIMIT 100, 30;
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+-------+----------------------------------------------+
| 1 | SIMPLE | A | index | PRIMARY | idx_reg_date | 8 | NULL | 22151 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | B | ref | idx_order_id | idx_order_id | 8 | bom_20140804.A.id | 2 | Using index; Distinct |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)

MariaDB [test]> explain
-> SELECT A.id
-> FROM order A
-> JOIN order_detail_products B ON A.id = B.order_id
-> GROUP BY A.id
-> ORDER BY A.reg_date DESC LIMIT 100, 30;
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+------+------------------------------+
| 1 | SIMPLE | A | index | PRIMARY | idx_reg_date | 8 | NULL | 65 | Using index; Using temporary |
| 1 | SIMPLE | B | ref | idx_order_id | idx_order_id | 8 | bom_20140804.A.id | 2 | Using index |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+------+------------------------------+

如上所列,两个查询返回相同的结果,但不同的结果太慢(解释太多行)。有什么区别?

最佳答案

通常建议使用 DISTINCT 而不是 GROUP BY,因为那是你真正想要的,让优化器选择“最佳”执行计划。但是 - 没有优化器是完美的。使用 DISTINCT 优化器可以为执行计划提供更多选项。但这也意味着它有更多的选项来选择一个糟糕的计划

您写道 DISTINCT 查询“慢”,但您没有给出任何数字。在我的测试中(在 MariaDB 10.0.1910.3.13 上有 10 倍的行数)DISTINCT 查询就像(仅)25 % 慢 (562ms/453ms)。 EXPLAIN 结果毫无帮助。甚至是“说谎”。使用 LIMIT 100, 30 它需要至少读取 130 行(这就是我的 EXPLAIN 实际上为 GROUP BY 显示的内容),但它显示你 65 岁。

我无法解释 25% 的执行时间差异,但似乎引擎在任何情况下都在进行全表/索引扫描,并对结果进行排序,然后才能跳过 100 行并选择 30 行。

最好的计划可能是:

  • idx_reg_date索引(表A)中按降序逐行读取
  • 查看idx_order_id索引(表B)是否匹配
  • 跳过 100 个匹配行
  • 发送 30 个匹配行
  • 退出

如果 A 中有大约 10% 的行在 B 中没有匹配项,则该计划将从 A 中读取大约 143 行>.

我能以某种方式强制执行此计划的最佳做法是:

SELECT A.id
FROM `order` A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30
OFFSET 100

此查询在 156 毫秒内返回相同的结果(比 GROUP BY 快 3 倍)。但这还是太慢了。而且它可能仍在读取表 A 中的所有行。

我们可以通过一个“小”子查询技巧证明可以存在更好的计划:

SELECT A.id
FROM (
SELECT id, reg_date
FROM `order`
ORDER BY reg_date DESC
LIMIT 1000
) A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30
OFFSET 100

此查询“没有时间”(~ 0 毫秒)执行,并在我的测试数据上返回相同的结果。虽然它不是 100% 可靠,但它表明优化器做得不好。

那么我的结论是什么:

  • 优化器并不总是做得最好,有时需要帮助
  • 即使我们知道“最好的计划”,我们也不能总是执行它
  • DISTINCT 并不总是比 GROUP BY
  • 当没有索引可以用于所有子句时 - 事情变得非常棘手

测试模式和虚拟数据:

drop table if exists `order`;
CREATE TABLE `order` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`reg_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_reg_date` (`reg_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

insert into `order`(reg_date)
select from_unixtime(floor(rand(1) * 1000000000)) as reg_date
from information_schema.COLUMNS a
, information_schema.COLUMNS b
limit 218860;

drop table if exists `order_detail_products`;
CREATE TABLE `order_detail_products` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`order_id` bigint(20) unsigned NOT NULL,
`order_detail_id` int(11) NOT NULL,
`prod_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_order_detail_id` (`order_detail_id`,`prod_id`),
KEY `idx_order_id` (`order_id`,`order_detail_id`,`prod_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

insert into order_detail_products(id, order_id, order_detail_id, prod_id)
select null as id
, floor(rand(2)*218860)+1 as order_id
, 0 as order_detail_id
, 0 as prod_id
from information_schema.COLUMNS a
, information_schema.COLUMNS b
limit 437320;

查询:

SELECT DISTINCT A.id
FROM `order` A
JOIN order_detail_products B ON A.id = B.order_id
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- 562 ms

SELECT A.id
FROM `order` A
JOIN order_detail_products B ON A.id = B.order_id
GROUP BY A.id
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- 453 ms

SELECT A.id
FROM `order` A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- 156 ms

SELECT A.id
FROM (
SELECT id, reg_date
FROM `order`
ORDER BY reg_date DESC
LIMIT 1000
) A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- ~ 0 ms

关于mysql - 不同与分组依据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25114506/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com