gpt4 book ai didi

MySQL性能优化: order by datetime field

转载 作者:IT老高 更新时间:2023-10-28 13:00:05 45 4
gpt4 key购买 nike

我有一个包含大约 100.000 个博客帖子的表格,通过 1:n 关系链接到一个包含 50 个提要的表格。当我使用 select 语句查询两个表时,按张贴表的日期时间字段排序,MySQL 总是使用文件排序,导致查询时间非常慢(> 1 秒)。这是 postings 表的架构(简化):

+---------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| feed_id | int(11) | NO | MUL | NULL | |
| crawl_date | datetime | NO | | NULL | |
| is_active | tinyint(1) | NO | MUL | 0 | |
| link | varchar(255) | NO | MUL | NULL | |
| author | varchar(255) | NO | | NULL | |
| title | varchar(255) | NO | | NULL | |
| excerpt | text | NO | | NULL | |
| long_excerpt | text | NO | | NULL | |
| user_offtopic_count | int(11) | NO | MUL | 0 | |
+---------------------+--------------+------+-----+---------+----------------+

这是 feed 表:

+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| type | int(11) | NO | MUL | 0 | |
| title | varchar(255) | NO | | NULL | |
| website | varchar(255) | NO | | NULL | |
| url | varchar(255) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+

这是执行时间超过 1 秒的查询。请注意 post_date 字段有一个索引,但 MySQL 并没有使用它来对发帖表进行排序:

SELECT 
`postings`.`id`,
UNIX_TIMESTAMP(postings.post_date) as post_date,
`postings`.`link`,
`postings`.`title`,
`postings`.`author`,
`postings`.`excerpt`,
`postings`.`long_excerpt`,
`feeds`.`title` AS feed_title,
`feeds`.`website` AS feed_website
FROM
(`postings`)
JOIN
`feeds`
ON
`feeds`.`id` = `postings`.`feed_id`
WHERE
`feeds`.`type` = 1 AND
`postings`.`user_offtopic_count` < 10 AND
`postings`.`is_active` = 1
ORDER BY
`postings`.`post_date` desc
LIMIT
15

explain extended 命令的结果表明 MySQL 正在使用文件排序:

+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+
| 1 | SIMPLE | postings | ref | feed_id,is_active,user_offtopic_count | is_active | 1 | const | 30996 | Using where; Using filesort |
| 1 | SIMPLE | feeds | eq_ref | PRIMARY,type | PRIMARY | 4 | feedian.postings.feed_id | 1 | Using where |
+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+

当我删除 order by 部分时,MySQL 停止使用文件排序。如果您对如何优化此查询以使 MySQL 使用索引对数据进行排序和选择有任何想法,请告诉我。正如一些博客文章所建议的那样,我已经尝试了一些事情,例如在所有 where/order by 字段上创建组合索引,但这也不起作用。

最佳答案

postings (is_active, post_date) 上创建复合索引(按此顺序)。

它将用于过滤is_active并通过 post_date 订购.

MySQL应该显示 REF EXPLAIN EXTENDED 中此索引的访问方法.

请注意,您有一个 RANGE过滤条件超过 user_offtopic_count ,这就是为什么在过滤和按其他字段排序时不能对该字段使用索引的原因。

取决于您的user_offtopic_count 的选择性。 (即满足 user_offtopic_count < 10 的行数),在 user_offtopic_count 上创建索引可能更有用并让 post_dates 排序。

为此,请在 postings (is_active, user_offtopic_count) 上创建一个复合索引并确保 RANGE使用了此索引的访问方法。

哪个索引会更快取决于您的数据分布。创建两个索引,FORCE看看哪个更快:

CREATE INDEX ix_active_offtopic ON postings (is_active, user_offtopic_count);
CREATE INDEX ix_active_date ON postings (is_active, post_date);

SELECT
`postings`.`id`,
UNIX_TIMESTAMP(postings.post_date) as post_date,
`postings`.`link`,
`postings`.`title`,
`postings`.`author`,
`postings`.`excerpt`,
`postings`.`long_excerpt`,
`feeds`.`title` AS feed_title,
`feeds`.`website` AS feed_website
FROM
`postings` FORCE INDEX (ix_active_offtopic)
JOIN
`feeds`
ON
`feeds`.`id` = `postings`.`feed_id`
WHERE
`feeds`.`type` = 1 AND
`postings`.`user_offtopic_count` < 10 AND
`postings`.`is_active` = 1
ORDER BY
`postings`.`post_date` desc
LIMIT
15

/* This should show RANGE access with few rows and keep the FILESORT */

SELECT
`postings`.`id`,
UNIX_TIMESTAMP(postings.post_date) as post_date,
`postings`.`link`,
`postings`.`title`,
`postings`.`author`,
`postings`.`excerpt`,
`postings`.`long_excerpt`,
`feeds`.`title` AS feed_title,
`feeds`.`website` AS feed_website
FROM
`postings` FORCE INDEX (ix_active_date)
JOIN
`feeds`
ON
`feeds`.`id` = `postings`.`feed_id`
WHERE
`feeds`.`type` = 1 AND
`postings`.`user_offtopic_count` < 10 AND
`postings`.`is_active` = 1
ORDER BY
`postings`.`post_date` desc
LIMIT
15

/* This should show REF access with lots of rows and no FILESORT */

关于MySQL性能优化: order by datetime field,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/714950/

45 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com