gpt4 book ai didi

mysql - 针对海量数据优化MySQL Full outer join

转载 作者:可可西里 更新时间:2023-11-01 08:17:15 31 4
gpt4 key购买 nike

我们有以下 mysql 表(为了直截了当而进行了简化)

CREATE TABLE `MONTH_RAW_EVENTS` (
`idEvent` int(11) unsigned NOT NULL,
`city` varchar(45) NOT NULL,
`country` varchar(45) NOT NULL,
`ts` datetime NOT NULL,
`idClient` varchar(45) NOT NULL,
`event_category` varchar(45) NOT NULL,
... bunch of other fields
PRIMARY KEY (`idEvent`),
KEY `idx_city` (`city`),
KEY `idx_country` (`country`),
KEY `idClient` (`idClient`),
) ENGINE=InnoDB;

CREATE TABLE `compilation_table` (
`idClient` int(11) unsigned DEFAULT NULL,
`city` varchar(200) DEFAULT NULL,
`month` int(2) DEFAULT NULL,
`year` int(4) DEFAULT NULL,
`events_profile` int(10) unsigned NOT NULL DEFAULT '0',
`events_others` int(10) unsigned NOT NULL DEFAULT '0',
`events_total` int(10) unsigned NOT NULL DEFAULT '0',
KEY `idx_month` (`month`),
KEY `idx_year` (`year`),
KEY `idx_idClient` (`idClient`),
KEY `idx_city` (`city`)
) ENGINE=InnoDB;

MONTH_RAW_EVENTS 包含近 2000 万行用户在网站中执行的操作,它的大小接近 4GB

compilation_table 有每个月的客户/城市摘要,我们用它在网站上实时显示统计数据

我们每月处理一次统计数据(从第一个表到第二个表),并且我们正在尝试优化执行此类操作的查询(因为直到现在我们都在 PHP 中处理所有需要很长时间的事情)

这是我们想出的查询,它似乎在使用小数据子集时完成了工作,全套数据处理时间超过6小时的问题

INSERT INTO compilation_table (idClient,city,month,year,events_profile,events_others)


SELECT IFNULL(OTHERS.idClient,AP.idClient) as idClient,
IF(IFNULL(OTHERS.city,AP.city)='','Others',IFNULL(OTHERS.city,AP.city)) as city,
01,2014,
IFNULL(AP.cnt,0) as events_profile,
IFNULL(OTHERS.cnt,0) as events_others

FROM
(
SELECT idClient,CONCAT(city,', ',country) as city,count(*) as cnt
FROM `MONTH_RAW_EVENTS` WHERE `ts`>'2014-01-01 00:00:00' AND `ts`<='2014-01-31 23:59:59'
AND `event_category`!='CLIENT PROFILE'
GROUP BY idClient,city
) as OTHERS
LEFT JOIN
(
SELECT idClient,CONCAT(city,', ',country) as city,count(*) as cnt
FROM `MONTH_RAW_EVENTS` WHERE `ts`>'2014-01-01 00:00:00' AND `ts`<='2014-01-31 23:59:59'
AND `event_category`='CLIENT PROFILE'
GROUP BY idClient,city
) as CLIPROFILE
ON CLIPROFILE.city=OTHERS.city and CLIPROFILE.idClient=OTHERS.idClient

UNION

SELECT IFNULL(OTHERS.idClient,CLIPROFILE.idClient) as idClient,
IF(IFNULL(OTHERS.city,CLIPROFILE.city)='','Others',IFNULL(OTHERS.city,CLIPROFILE.city)) as city,
01,2014,
IFNULL(CLIPROFILE.cnt,0) as events_profile,
IFNULL(OTHERS.cnt,0) as events_others
FROM
(
SELECT idClient,CONCAT(city,', ',country) as city,count(*) as cnt
FROM `MONTH_RAW_EVENTS` WHERE `ts`>'2014-01-01 00:00:00' AND `ts`<='2014-01-31 23:59:59'
AND `event_category`!='CLIENT PROFILE'
GROUP BY idClient,city
) as OTHERS
RIGHT JOIN
(
SELECT idClient,CONCAT(city,', ',country) as city,count(*) as cnt
FROM `MONTH_RAW_EVENTS` WHERE `ts`>'2014-01-01 00:00:00' AND `ts`<='2014-01-31 23:59:59'
AND `event_category`='CLIENT PROFILE'
GROUP BY idClient,city
) as CLIPROFILE
ON CLIPROFILE.city=OTHERS.city and CLIPROFILE.idClient=OTHERS.idClient

我们正在尝试做的是在 Mysql 中进行 FULL Outer Join,因此查询的基本模式如下:the one proposed here

我们如何优化查询?我们一直在尝试不同的索引,四处切换,但 8 小时后仍然没有完成运行,

MySQL 服务器是一台 Percona MySQL 5.5 专用机器,配备 2cpu、2GB 内存和 SSD 磁盘,我们使用 Percona 工具优化了此类服务器的配置,

非常感谢任何帮助,

谢谢

最佳答案

您正在执行一个导致 DISTINCT 处理的 UNION。

通常最好将 Full Join 重写为 Left Join 加上 Right Join 的非匹配行(如果它是正确的 1:n join)

OTHERS LEFT JOIN CLIPROFILE 
ON CLIPROFILE.city=OTHERS.city and CLIPROFILE.idClient=OTHERS.idClient
union all
OTHERS RIGHT JOIN CLIPROFILE
ON CLIPROFILE.city=OTHERS.city and CLIPROFILE.idClient=OTHERS.idClient
WHERE OTHERS.idClient IS NULL

此外,您可能会在加入临时表之前在临时表中具体化派生表的结果,因此计算只进行一次(我不知道 MySQL 的优化器是否足够智能以自动执行此操作)。

此外,将城市/国家作为单独的列进行分组和连接并在外部步骤中执行 CONCAT(city,', ',country) as city 可能会更有效。 p>

关于mysql - 针对海量数据优化MySQL Full outer join,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22284887/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com