gpt4 book ai didi

mysql - 检测高于阈值的近似重复项

转载 作者:行者123 更新时间:2023-11-30 23:18:24 24 4
gpt4 key购买 nike

我希望能够在表中查询我怀疑可能几乎重复的记录。

我绞尽脑汁也想不出从何入手,所以把问题尽量简化,来这里问!

这是我的简化表:

CREATE TABLE sales
(
`id1` int auto_increment primary key,
`amount` decimal(6,2),
`date` datetime
);

这里有一些测试值:

INSERT INTO sales
(`amount`, `date`)
VALUES
(10, '2013-05-15T11:11:00'),
(11, '2013-05-15T11:11:11'),
(20, '2013-05-15T11:22:00'),
(3, '2013-05-15T12:12:00'),
(4, '2013-05-15T12:12:12'),
(45, '2013-05-15T12:22:00'),
(4, '2013-05-15T12:24:00'),
(8, '2013-05-15T13:00:00'),
(9, '2013-05-15T13:01:00'),
(10, '2013-05-15T14:00:00');

问题

我想返回超过 Y 的销售额,即相邻的 Y 销售额在 X 分钟内记录。

即,从这个数据:

amt, date
(10, '2013-05-15T11:11:00'),
(11, '2013-05-15T11:11:11'),
(20, '2013-05-15T11:22:00'),
(3, '2013-05-15T12:12:00'),
(4, '2013-05-15T12:12:12'),
(45, '2013-05-15T12:22:00'),
(4, '2013-05-15T12:24:00'),
(8, '2013-05-15T13:00:00'),
(9, '2013-05-15T13:01:00'),
(10, '2013-05-15T14:00:00');

其中 @yVal = 5@xMins = 10

预期的结果是:

(10, '2013-05-15T11:11:00'),
(11, '2013-05-15T11:11:11'),
(20, '2013-05-15T11:22:00'),
(8, '2013-05-15T13:00:00'),
(9, '2013-05-15T13:01:00'),

我已将以上内容放入 fiddle 中:http://sqlfiddle.com/#!2/cf8fe

任何帮助将不胜感激!

最佳答案

尝试这样的事情:

SELECT DISTINCT s1.* FROM sales s1
LEFT JOIN sales s2
ON (s1.id1 != s2.id1
AND s1.amount >= s2.amount - @xVal AND s1.amount <= s2.amount + @xVal
AND s1.date >= DATE_SUB(s2.date, INTERVAL @xMins minute) AND s1.date <= DATE_ADD(s2.date, INTERVAL @xMins minute)
)
WHERE
s2.id1 is not null

扩展

修复一些错误

您的数据结果如下:

+-----+--------+---------------------+
| id1 | amount | date |
+-----+--------+---------------------+
| 1 | 10.00 | 2013-05-15 11:11:00 |
| 2 | 11.00 | 2013-05-15 11:11:11 |
| 4 | 3.00 | 2013-05-15 12:12:00 |
| 5 | 4.00 | 2013-05-15 12:12:12 |
| 8 | 8.00 | 2013-05-15 13:00:00 |
| 9 | 9.00 | 2013-05-15 13:01:00 |
+-----+--------+---------------------+

扩展 2

    SELECT DISTINCT s1.* FROM sales s1
LEFT JOIN sales s2
ON (s1.id1 != s2.id1
AND s2.amount >= @xVal
AND s1.date >= DATE_SUB(s2.date, INTERVAL @xMins minute) AND s1.date <= DATE_ADD(s2.date, INTERVAL @xMins minute)
)
WHERE
s2.id1 is not null
AND s1.amount >= @xVal

关于mysql - 检测高于阈值的近似重复项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16561980/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com