gpt4 book ai didi

当键可用时MySQL查询全表扫描

转载 作者:行者123 更新时间:2023-11-29 15:13:04 25 4
gpt4 key购买 nike

在尝试从多个连接表中提取大量列(~15-20)时,我将 2 个 View 放在一起,以提取必要的信息。然而,在我的本地数据库(只有 ~1k posts 行)中,加入这些 View 效果很好;当我在生产数据库(~30k posts 行)上创建相同的 View 并尝试加入 View 时,我意识到该解决方案无法扩展到测试数据集之外。

我尝试将这 2 个 View (类别数据(如 categories.title)和创作者数据(如 users.display_name)迁移到 CTE post_data 中。从理论上讲,它将充当这些 View 的键控版本,并允许我获取符合条件的帖子的所有帖子数据。

我已经整理了一个样本DBFiddle用一些测试数据来解释表结构。实际数据还有更多列,但这代表了构建查询所需的联接。

table : posts
+-----+-----------+------------+------------------------------------------+----------------------------------------+
| id | parent_id | created_by | message | attachments |
+-----+-----------+------------+------------------------------------------+----------------------------------------+
| 8 | NULL | 8 | laptop for sale | [{"media_id": 1380}] |
| 9 | NULL | 4 | NEW lamp shade up for grabs | [{"media_id": 1442}, {"link_id": 103}] |
| 10 | 1 | 7 | Oooh I could be interested | |
| 11 | 1 | 7 | DMing you now! I've been looking for one | |
+-----+-----------+------------+------------------------------------------+----------------------------------------+

table : users
+----+------------------+---------------------------+
| id | display_name | created_at |
+----+------------------+---------------------------+
| 1 | John Appleseed | 2018-02-20T00:00:00+00:00 |
| 2 | Massimo Jenkins | 2018-05-14T00:00:00+00:00 |
| 3 | Johanna Marionna | 2018-06-05T00:00:00+00:00 |
| 4 | Jackson Creek | 2018-11-15T00:00:00+00:00 |
| 5 | Joe Schmoe | 2019-01-09T00:00:00+00:00 |
| 6 | John Johnson | 2019-02-14T00:00:00+00:00 |
| 7 | Donna Madison | 2019-05-14T00:00:00+00:00 |
| 8 | Jenna Kaplan | 2019-06-23T00:00:00+00:00 |
+----+------------------+---------------------------+

table : categories
+----+------------+------------+-------------------------------------------------------+
| id | created_by | title | description |
+----+------------+------------+-------------------------------------------------------+
| 1 | 2 | Technology | Anything tech; Consumer, business or education tools! |
| 2 | 2 | Home Goods | Anything for the home |
+----+------------+------------+-------------------------------------------------------+

table : categories_posts
+---------+-------------+
| post_id | category_id |
+---------+-------------+
| 8 | 1 |
| 9 | 1 |
| 10 | 1 |
| 11 | 1 |
+---------+-------------+

table : users_categories
+---------+-------------+
| user_id | category_id |
+---------+-------------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
+---------+-------------+

table : posts_removed
+---------+----------------------+------------+
| post_id | removed_at | removed_by |
+---------+----------------------+------------+
| 10 | 2019-01-22 09:08:14 | 7 |
+---------+----------------------+------------+

在下面的查询中,符合条件的帖子是在基 SELECT 中确定的;然后,post_data CTE 连接到结果集(限制为 25 行),并返回 CTE 中的所有列。

WITH post_data AS (
SELECT posts.id,
posts.parent_id,
posts.created_by,
posts.attachments,
categories_posts.category_id,
categories.title,
categories.created_by AS category_created_by,
creator.display_name AS creator_display_name,
creator.created_at AS creator_created_at
/* ... And a whole bunch of other fields from posts, categories_posts, users */
FROM posts
LEFT OUTER JOIN categories_posts
ON categories_posts.post_id = posts.id
LEFT OUTER JOIN categories
ON categories.id = categories_posts.category_id
LEFT OUTER JOIN users creator
ON creator.id = posts.created_by
/* ... And a whole bunch of other joins to facilitate the selected fields */
)
SELECT post_data.*
FROM posts
/* Set up the criteria for the posts selected before getting their data from the CTE */
LEFT OUTER JOIN posts_removed removed ON removed.post_id = posts.id
LEFT OUTER JOIN users user_me ON user_me.id = "1"
LEFT OUTER JOIN users_followed ON users_followed.user_id = posts.created_by
AND users_followed.followed_by = user_me.id
LEFT OUTER JOIN categories_posts ON categories_posts.post_id = posts.id
LEFT OUTER JOIN users_categories ON users_categories.category_id = categories_posts.category_id
LEFT OUTER JOIN posts_removed pp_removed ON pp_removed.post_id = posts.parent_id
/* Join our post_data on the post's ID */
JOIN post_data ON post_data.id = posts.id
WHERE
(
(
users_categories.user_id = user_me.id AND users_categories.left_at IS NULL
) OR categories_posts.category_id IS NULL
) AND (
posts.created_by = user_me.id
OR users_followed.followed_by = user_me.id
OR categories_posts.category_id IS NOT NULL
) AND removed.removed_at IS NULL
AND pp_removed.removed_at IS NULL
AND (post_data.id = posts.id OR post_data.id = posts.parent_id)
ORDER BY posts.id DESC
LIMIT 25

理论上,我认为这可以通过根据基本选择条件选择行,然后根据帖子 ID 对 CTE 进行索引扫描来实现;然而,查询优化器似乎选择对posts进行全表扫描。表。

EXPLAIN SELECT给了我这个信息:

+----+-------------+------------------------+--------+-------------------------------+-------------+---------+---------------------------------------------+--------+----------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | extra |
+----+-------------+------------------------+--------+-------------------------------+-------------+---------+---------------------------------------------+--------+----------+----------------------------------------------------+
| 1 | PRIMARY | posts | ALL | PRIMARY,parent_id,created_by | | | | 33870 | 100 | Using temporary; Using filesort |
| 1 | PRIMARY | removed | eq_ref | PRIMARY | PRIMARY | 8 | posts.id | 1 | 19 | Using where |
| 1 | PRIMARY | user_me | const | PRIMARY | PRIMARY | 8 | const | 1 | 100 | Using where; Using index |
| 1 | PRIMARY | categories_posts | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.id | 1 | 100 | |
| 1 | PRIMARY | categories | eq_ref | PRIMARY | PRIMARY | 8 | categories_posts.category_id | 1 | 100 | Using index |
| 1 | PRIMARY | users_categories | eq_ref | user_id_2,user_id,category_id | user_id_2 | 16 | user_me.id,api.categories_posts.category_id | 1 | 100 | Using where |
| 1 | PRIMARY | users_followed | eq_ref | user_id,followed_by | user_id | 16 | posts.created_by,api.user_me.id | 1 | 100 | Using where; Using index |
| 1 | PRIMARY | pp_removed | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.parent_id | 1 | 19 | Using where |
| 1 | PRIMARY | <derived2> | ALL | | | | | 493911 | 19 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED | posts | ALL | | | | | 33870 | 100 | Using temporary |
| 2 | DERIVED | categories_posts | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.id | 1 | 100 | |
| 2 | DERIVED | categories | eq_ref | PRIMARY | PRIMARY | 8 | api.categories_posts.category_id | 1 | 100 | |
| 2 | DERIVED | posts_votes | ref | post_id | post_id | 8 | api.posts.id | 1 | 100 | Using index |
| 2 | DERIVED | pp | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.parent_id | 1 | 100 | |
| 2 | DERIVED | pp_removed | eq_ref | PRIMARY | PRIMARY | 8 | api.pp.id | 1 | 100 | Using index |
| 2 | DERIVED | removed | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.id | 1 | 100 | Using index |
| 2 | DERIVED | creator | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.created_by | 1 | 100 | |
| 2 | DERIVED | usernames | ref | user_id | user_id | 8 | api.creator.id | 1 | 100 | |
| 2 | DERIVED | verifications | ALL | | | | | 4 | 100 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED | categories_identifiers | ref | category_id | category_id | 8 | api.categories.id | 1 | 100 | |
+----+-------------+------------------------+--------+-------------------------------+-------------+---------+---------------------------------------------+--------+----------+----------------------------------------------------+

除此之外,我尝试重构我的查询以尝试强制在 posts 中使用 key 。表,例如使用 FORCE INDEX(PRIMARY)在选择中,将 CTE 移动为基本查询并添加过滤器 WHERE id IN ({the original base query}) ,但优化器似乎仍然进行全表扫描。

如果解码查询计划中发生的事情有帮助:

  • 在撰写本文时,有 33,387 posts行,但查询计划显示
  • 查询计划显示全表扫描,返回 33,870
  • 查询计划还显示派生表 ( <derived2> ) 具有 493,911

我的核心问题是:

  1. 当我说子查询只能在基本选择查询的每个结果行执行一次时,我是否正确?如果是这样,那么 CTE 还应该在 posts.id 上使用 JOIN并可能使用表索引?

  2. 为什么查询计划显示它选择了 33,870 行,而实际上只有 33,387 行? 493,911 行从何而来?

  3. 在这种情况下如何防止全表扫描?

最佳答案

尝试一下...在 JOINingWITH 之前执行 LIMIT 25:

SELECT * FROM
( SELECT ... FROM posts
JOIN categories_posts ...
ORDER BY posts.id DESC
LIMIT 25 ) AS x
JOIN post_data
ON post_data.id IN (x.id, x.parent_id)
ORDER BY posts.id DESC

关于当键可用时MySQL查询全表扫描,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59923485/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com