gpt4 book ai didi

mysql查询优化: select with counted subquery extremely slow

转载 作者:行者123 更新时间:2023-11-28 23:09:33 25 4
gpt4 key购买 nike

我有以下表格:

mysql> show create table rsspodcastitems \G
*************************** 1. row ***************************
Table: rsspodcastitems
Create Table: CREATE TABLE `rsspodcastitems` (
`id` char(20) NOT NULL,
`description` mediumtext,
`duration` int(11) default NULL,
`enclosure` mediumtext NOT NULL,
`guid` varchar(300) NOT NULL,
`indexed` datetime NOT NULL,
`published` datetime default NULL,
`subtitle` varchar(255) default NULL,
`summary` mediumtext,
`title` varchar(255) NOT NULL,
`podcast_id` char(20) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `podcast_id` (`podcast_id`,`guid`),
UNIQUE KEY `UKfb6nlyxvxf3i2ibwd8jx6k025` (`podcast_id`,`guid`),
KEY `IDXkcqf7wi47t3epqxlh34538k7c` (`indexed`),
KEY `IDXt2ofice5w51uun6w80g8ou7hc` (`podcast_id`,`published`),
KEY `IDXfb6nlyxvxf3i2ibwd8jx6k025` (`podcast_id`,`guid`),
KEY `published` (`published`),
FULLTEXT KEY `title` (`title`),
FULLTEXT KEY `summary` (`summary`),
FULLTEXT KEY `subtitle` (`subtitle`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

mysql> show create table station_cache \G
*************************** 1. row ***************************
Table: station_cache
Create Table: CREATE TABLE `station_cache` (
`Station_id` char(36) NOT NULL,
`item_id` char(20) NOT NULL,
`item_type` int(11) NOT NULL,
`podcast_id` char(20) NOT NULL,
`published` datetime NOT NULL,
KEY `Station_id` (`Station_id`,`published`),
KEY `IDX12n81jv8irarbtp8h2hl6k4q3` (`Station_id`,`published`),
KEY `item_id` (`item_id`,`item_type`),
KEY `IDXqw9yqpavo9fcduereqqij4c80` (`item_id`,`item_type`),
KEY `podcast_id` (`podcast_id`,`published`),
KEY `IDXkp2ehbpmu41u1vhwt7qdl2fuf` (`podcast_id`,`published`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

第二个表的“item_id”列引用前一个的“id”列(两者之间没有外键,因为关系是多态的,即第二个表可能引用了实体t 在第一个但在其他类似但不同的表中)。

我正在尝试获取一个查询,该查询列出第一个表中的最新项目,而第二个表中没有任何对应的项目。迄今为止我发现的性能最高的查询是:

select i.*, 
(select count(station_id)
from station_cache
where item_id = i.id) as stations
from rsspodcastitems i
having stations = 0
order by published desc

我也考虑过使用 where not exists (...) 子查询来执行限制,但这实际上比我上面的那个慢。但这仍然需要相当长的时间才能完成。 MySQL 的查询计划似乎没有使用可用的索引:

+----+--------------------+---------------+------+---------------+------+---------+------+--------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------------+------+---------------+------+---------+------+--------+----------------+
| 1 | PRIMARY | i | ALL | NULL | NULL | NULL | NULL | 106978 | Using filesort |
| 2 | DEPENDENT SUBQUERY | station_cache | ALL | NULL | NULL | NULL | NULL | 44227 | Using where |
+----+--------------------+---------------+------+---------------+------+---------+------+--------+----------------+

请注意,查询的两个部分都没有使用key,而它应该能够使用主表中的KEY published(发布) KEY item_id (item_id,item_type) 用于子查询。

关于如何在不等待几分钟的情况下获得适当结果的任何建议?

最佳答案

我希望最快的查询是:

select i.*
from rsspodcastitems i
where not exists (select 1
from station_cache sc
where sc.item_id = i.id
)
order by published desc;

这将利用 station_cache(item_id)rsspodcastitems(published, id) 上的索引。

如果您的查询返回大量行,您的查询可能会更快。您对查询的措辞允许 rsspodcastitems(published) 上的索引避免文件排序。如果删除 group byexists 版本应该更快。

请注意,我喜欢您使用 having 子句。过去遇到这个问题时,我使用了子查询:

select i.*, 
(select count(station_id)
from station_cache
where item_id = i.id) as stations
from (select i.*
from rsspodcastitems i
order by published desc
) i
where not exists (select 1
from station_cache sc
where sc.item_id = i.id
);

这允许一个索引进行排序。

我更喜欢你的方法略有不同:

select i.*, 
(exists (select 1
from station_cache sc
where sc.item_id = i.id
)
) as has_station
from rsspodcastitems i
having has_station = 0
order by published desc;

这应该比带有 count() 的版本稍微快一些。

关于mysql查询优化: select with counted subquery extremely slow,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46354829/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com