gpt4 book ai didi

mysql - 如何在计算字段上优化 GROUP BY(以使用索引)?

转载 作者:行者123 更新时间:2023-11-29 13:31:40 25 4
gpt4 key购买 nike

我有一个很大(近10M记录)的数据表,出于性能原因,它有一个辅助聚合伴随表。聚合表定期填充迄今为止未聚合的数据:

REPLACE INTO aggregate (channel_id, type, timestamp, value, count)
SELECT channel_id, 'day' AS type, MAX(timestamp) AS timestamp, SUM(value) AS value, COUNT(timestamp) AS count FROM data
WHERE timestamp < UNIX_TIMESTAMP(DATE_FORMAT(NOW(), "%Y-%m-%d")) * 1000
AND timestamp >= IFNULL((SELECT UNIX_TIMESTAMP(DATE_ADD(FROM_UNIXTIME(MAX(timestamp)/1000, "%Y-%m-%d"),
INTERVAL 1 day)) * 1000 FROM aggregate WHERE type = 'day'), 0)
GROUP BY channel_id, YEAR(FROM_UNIXTIME(timestamp/1000)), DAYOFYEAR(FROM_UNIXTIME(timestamp/1000));

我发现即使没有返回数据,语句的 SELECT 部分也相当慢(在快速 PC 上需要 2 秒以上)。由于聚合需要在嵌入式设备上运行,这是一个问题。计划如下:

id  select_type table       type        key     key_len rows    Extra
1 PRIMARY data ALL 9184560 Using where; Using temporary; Using filesort
2 SUBQUERY aggregate index ts_uniq 22 1940 Using where; Using index

子查询本身是即时的。显然,由于 GROUP BY 子句中的计算,data 没有使用 channel_id/timestamp 索引:

CREATE TABLE `data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`channel_id` int(11) DEFAULT NULL,
`timestamp` bigint(20) NOT NULL,
`value` double NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ts_uniq` (`channel_id`,`timestamp`),
KEY `IDX_ADF3F36372F5A1AA` (`channel_id`)
) ENGINE=MyISAM AUTO_INCREMENT=10432870 DEFAULT CHARSET=latin1;

查询可以进一步优化吗?

更新:添加请求的信息

SHOW INDEXES FROM data;

Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Null Index_type
data 0 PRIMARY 1 id A 9184560 BTREE
data 0 ts_uniq 1 channel_id A 164 YES BTREE
data 0 ts_uniq 2 timestamp A 9184560 BTREE
data 1 IDX_ADF3.. 1 channel_id A 164 YES BTREE

CREATE TABLE `aggregate` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`channel_id` int(11) NOT NULL,
`type` varchar(8) NOT NULL,
`timestamp` bigint(20) NOT NULL,
`value` double NOT NULL,
`count` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ts_uniq` (`channel_id`,`type`,`timestamp`)
) ENGINE=MyISAM AUTO_INCREMENT=1941 DEFAULT CHARSET=latin1;

我还注意到,当将 GROUP BY 更改为channel_id、时间戳时,查询变得即时。不幸的是,将数据计算添加为列是不可取的,因为分组是动态计算的。

我无法理解为什么当甚至没有任何数据要分组时 GROUP BY 索引会成为这样的问题。我尝试过运行

SELECT channel_id, 'day' AS type, MAX(timestamp) AS timestamp, SUM(value) AS value, COUNT(timestamp) AS count FROM data 
WHERE timestamp < UNIX_TIMESTAMP(DATE_FORMAT(NOW(), "%Y-%m-%d")) * 1000
AND timestamp >= IFNULL((SELECT UNIX_TIMESTAMP(DATE_ADD(FROM_UNIXTIME(MAX(timestamp)/1000, "%Y-%m-%d"), INTERVAL 1 day)) * 1000
FROM aggregate WHERE type = 'day'), 0)

同样慢,所以 GROUP 似乎不是问题所在?

更新2

进一步挖掘这条路表明

SELECT channel_id, 'day' AS type, timestamp, value, 1 FROM data 
WHERE timestamp >= (SELECT UNIX_TIMESTAMP(DATE_ADD(FROM_UNIXTIME(MAX(timestamp)/1000, "%Y-%m-%d"),
INTERVAL 1 day)) * 1000 FROM aggregate WHERE type = 'day');

仍然很慢(1.4秒) - 所以根本不是GROUP BY问题。

更新3

这仍然很慢:

SELECT channel_id, 'day' AS type, timestamp, value, 1 FROM data WHERE timestamp >= 1380837600000;

所以问题是内部比较是针对时间戳的,它不能使用channel_id、时间戳索引,尽管这是GROUP BY子句的一部分。这就引出了如何强制该索引的问题?

最佳答案

向数据表添加年份和日期列,并在 (channel_id,year,dayofyear) 上建立索引。插入行时填充两个新列。

关于mysql - 如何在计算字段上优化 GROUP BY(以使用索引)?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19364323/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com