gpt4 book ai didi

mysql - 如何检查MySQL中的每一行(时间戳 - 60秒)以确定是否存在重复数据?

转载 作者:行者123 更新时间:2023-11-29 09:54:40 25 4
gpt4 key购买 nike

我有这样的 table

table data will be like this

你会看到红十字签名就是我想要的结果。我想将红十字签名移至错误日志表,因为它指示重复的数据。

确定数据是否重复:

  1. 查找每行时间戳之前 60 秒的数据
  2. 相同的advertiser_id、offer_id、commission_id、commission_tier_id、creative_id、publisher_id、publisher_asset_id、source_id

示例:

1545981655
1545981657 x -> will marked as duplicate because 1545981657 - 60 = 1545981597. Search first data > 1545981597 except this line. 1545981655 will return.
1545981660 x -> will marked as duplicate because 1545981660 - 60 = 1545981600. Search first data > 1545981600 except this line. 1545981655 will return.
1545981662 x -> will marked as duplicate because 1545981662 - 60 = 1545981602. Search first data > 1545981602 except this line. 1545981655 will return.
1545981707 -> won't marked as duplicate because 1545981707 - 60 = 1545981647. Search first data > 1545981647 except this line. 1545981655 won't return because publisher_asset_id is different.
1545981710 x -> will marked as duplicate because 1545981710 - 60 = 1545981650. Search first data > 1545981650 except this line. 1545981707 will return.
1545981712 x -> will marked as duplicate because 1545981712 - 60 = 1545981652. Search first data > 1545981650 except this line. 1545981707 will return.
1545981714 x -> will marked as duplicate because 1545981714 - 60 = 1545981654. Search first data > 1545981654 except this line. 1545981707 will return.
1545981718 -> won't marked as duplicate because 1545981718 - 60 = 1545981658. Search first data > 1545981658 except this line. No data returns, because pubisher_asset_id is different

如何在 mysql 查询语句中实现此目的,而不是循环整个数据?

我想达到这样的结果:

result table want to achieve

需要你们的帮助。非常感谢。

最佳答案

将表 T 重命名为您的表并尝试以下操作:

SELECT * FROM (
SELECT id, advertiser_id, offer_id, commission_id, commission_tier_id, creative_id, publisher_id, publisher_asset_id, source_id, impression_timestamp,
COUNT(*) OVER (PARTITION BY advertiser_id, offer_id, commission_id, commission_tier_id, creative_id, publisher_id, publisher_asset_id, source_id ORDER BY impression_timestamp RANGE 60 PRECEDING) AS DuplicateFlag
FROM T
) DetectDuplicate
WHERE DuplicateFlag > 1

编辑:在 MySQL 8 之前,上面的查询无法完成,必须替换为带有 JOIN 的查询(不幸的是有点慢):

SELECT DISTINCT T2.*
FROM T T1
LEFT OUTER JOIN T T2
ON T1.id <> T2.id
AND T1.advertiser_id = T2.advertiser_id
AND T1.offer_id = T2.offer_id
AND T1.commission_id = T2.commission_id
AND T1.commission_tier_id = T2.commission_tier_id
AND T1.creative_id = T2.creative_id
AND T1.publisher_id = T2.publisher_id
AND T1.publisher_asset_id = T2.publisher_asset_id
AND T1.source_id = T2.source_id
AND T1.impression_timestamp >= T2.impression_timestamp - 60
WHERE T2.id IS NOT NULL

至少还有一种其他语法是可能的,例如:

SELECT *
FROM T Main
WHERE EXISTS (
SELECT 1
FROM T
WHERE id <> Main.id
AND advertiser_id = Main.advertiser_id
AND offer_id = Main.offer_id
AND commission_id = Main.commission_id
AND commission_tier_id = Main.commission_tier_id
AND creative_id = Main.creative_id
AND publisher_id = Main.publisher_id
AND publisher_asset_id = Main.publisher_asset_id
AND source_id = Main.source_id
AND impression_timestamp >= Main.impression_timestamp - 60
)

关于mysql - 如何检查MySQL中的每一行(时间戳 - 60秒)以确定是否存在重复数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54068375/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com