gpt4 book ai didi

mysql - 优化 SELECT count(DISTINCT ip)

转载 作者:行者123 更新时间:2023-11-29 06:13:05 24 4
gpt4 key购买 nike

我正在尝试从一个每天有大约 200 万个新行的表中获取汇总结果(唯一 IP 总数)。

表格:

CREATE TABLE `clicks` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`hash` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`type` enum('popunder','gallery','exit','direct') COLLATE utf8_unicode_ci NOT NULL,
`impression_time` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`source_user_id` int(11) NOT NULL,
`destination_user_id` int(11) NOT NULL,
`destination_campaign_id` int(11) NOT NULL,
`destination_campaign_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`destination_campaign_url` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`ip` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`referrer` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`country_code` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`country_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`country` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`isp` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`category_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`category` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`bid` float(8,2) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `ip` (`ip`),
KEY `source_user_id` (`source_user_id`),
KEY `destination_user_id` (`destination_user_id`),
KEY `destination_campaign_id` (`destination_campaign_id`),
KEY `clicks_hash_index` (`hash`),
KEY `clicks_created_at_index` (`created_at`),
KEY `campaign_date` (`destination_campaign_id`,`created_at`),
KEY `source_user_date` (`source_user_id`,`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=301539660 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

我的查询:

SELECT SUM(ips_by_date.count) as count, ips_by_date.date as date
FROM (SELECT count(DISTINCT ip) as count, DATE(created_at) as date
FROM clicks as clicks
WHERE created_at BETWEEN '2016-05-22 00:00:00' AND '2016-05-23 23:59:59'
GROUP BY DATE(created_at)) as ips_by_date
GROUP BY date;

现在,这个查询只用了 93 秒就运行了一天,我觉得我错过了什么。

我可以做任何优化来加快这个简单计数的性能吗?

谢谢。

最佳答案

首先,我不明白为什么需要子查询。内部查询每个日期一行。无需再次聚合。其次,你的查询是两天,但我得到了关于性能的分数。

那么,让我们开始:

SELECT count(DISTINCT ip) as count, DATE(created_at) as date 
FROM clicks
WHERE created_at BETWEEN '2016-05-22 00:00:00' AND '2016-05-23 23:59:59'
GROUP BY DATE(created_at);

对于此查询,您需要一个关于clicks(created_at, ip) 的索引。另请注意,我会将其写为:

SELECT count(DISTINCT ip) as count, DATE(created_at) as date 
FROM clicks
WHERE created_at >= '2016-05-22' AND created_at < '2016-05-24'
GROUP BY DATE(created_at);

这应该显示出一些改进,但我认为它不会从根本上变得更好,因为外部聚合仍然需要文件排序。

关于mysql - 优化 SELECT count(DISTINCT ip),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37381066/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com