gpt4 book ai didi

mysql - 查找具有 5 个共同字段中的 3 个的行 - 如何加快查询速度?

转载 作者:行者123 更新时间:2023-11-30 23:36:02 25 4
gpt4 key购买 nike

下面的查询很好但很慢。在大约 7500 行的表中,执行大约需要 30 秒。我怎样才能加快速度?

目标是在同一个表中找到“几乎重复”的行。当 5 个字段中有 3 个匹配时,我们就命中了。

SELECT 
originalTable.id,
originalTable.lastname,
originalTable.firstname,
originalTable.address,
originalTable.city,
originalTable.email

FROM
address as originalTable,
address as compareTable

WHERE

# do not find the same record
originalTable.id != compareTable.id and

# at least 3 out of those 5 should match
(originalTable.firstname = compareTable.firstname) +
(originalTable.lastname = compareTable.lastname)  +
(originalTable.address = compareTable.address and originalTable.address != '')  +
(originalTable.city = compareTable.city and originalTable.city != '')  +
(originalTable.email = compareTable.email and originalTable.email != '')
>= 3


GROUP BY
originalTable.id

ORDER BY
originalTable.lastname asc,
originalTable.firstname asc,
originalTable.city asc

感谢任何优化提示。

最佳答案

这里需要笛卡尔积,没错。我提出了以下解决方案:

CREATE TABLE address_dups(INDEX (is_duplicate)) ENGINE=MEMORY   
SELECT
originalTable.id,
compareTable.id,
(
(originalTable.firstname = compareTable.firstname) +
(originalTable.lastname = compareTable.lastname) +
(originalTable.address = compareTable.address and originalTable.address != '') +
(originalTable.city = compareTable.city and originalTable.city != '') +
(originalTable.email = compareTable.email and originalTable.email != '')
>= 3
) AS is_duplicate
FROM
address as originalTable,
address as compareTable
WHERE originalTable.id != compareTable.id;

SELECT * FROM address_dups WHERE is_duplicate = 1;

这将为每个行 ID 提供您请求的模糊重复行 ID。

关于mysql - 查找具有 5 个共同字段中的 3 个的行 - 如何加快查询速度?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7284014/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com