gpt4 book ai didi

mysql - 删除MySQL中的重复数据

转载 作者:行者123 更新时间:2023-11-30 22:57:10 24 4
gpt4 key购买 nike

我试图在这个 SO 问题中模拟已接受的答案:Delete all Duplicate Rows except for One in MySQL? [duplicate]有一个转折,我想要一个表的数据(自动递增ID)来确定在另一个表中删除哪些行。 SQLFiddle here showing data.

在上面引用的 fiddle 中,我要查找的最终结果是 eventdetails_new 中 Event_ID = 4 & 6 的行将被删除(EVENTDETAILS_ID 的 5 & 6 ,以及 9 和 10),留下第 3 和第 5 行(EVENTDETAILS_ID 的 3 和 4 以及 7 和 8)。我希望这是有道理的。 理想情况下 events_new 中具有相同 Event_ID 的行也会被删除(我还没有开始处理,所以没有代码示例).

这是我正在尝试进行的查询,但我有点头疼:

SELECT *
FROM eventdetails_new AS EDN1, eventdetails_new AS EDN2
INNER JOIN events_new AS E1 ON `E1`.`Event_ID` = `EDN1`.`Event_ID`
INNER JOIN events_new AS E2 ON `E2`.`Event_ID` = `EDN2`.`Event_ID`
WHERE `E1`.`Event_ID` > `E2`.`Event_ID`
AND `E1`.`DateTime` = `E2`.`DateTime`
AND events_new.EventType_ID = 6;

这里也是SQLFiddle with the results of this query .不好。我可以在数据中看到 Event_ID,但由于某种原因无法查询。不确定如何继续解决此问题。

我知道这是一个 SELECT 查询,但我想不出在 DELETE 查询中有两个别名表的方法(我认为我需要?)。我想如果我能得到一个选择,我可以用一些 C# 代码删除它。然而,理想情况下,这一切都可以在单个查询或一组语句中完成,而不必离开 MySQL。

这是我对查询的第一个削减,但它同样糟糕:

DELETE e1 FROM eventdetails_new e1 
WHERE `events_new`.`Event_ID` > `events_new`.`Event_ID`
AND events_new.DateTime = events_new.DateTime AND events_new.EventType_ID = 6;

SQLFiddle 根本不允许我运行这个查询,所以它帮不上什么忙。但是,它给了我与上面相同的错误:Error Code: 1054. Unknown column 'events_new.Event_ID' in 'where clause'

如果有更好的方法,我决不会接受这些查询中的任何一个。我要查找的最终结果是删除了一堆重复数据。

我有数十万个这样的结果,我知道其中大约 1/3 是重复的,我需要在使用数据库之前将其删除。

最佳答案

这就是我最终做的事情。我和我的同事提出了一个查询,该查询将为我们提供具有重复数据的 Event_ID 列表(我们实际上使用了 Access 2010 的查询生成器并对其进行了 MySQL 化)。请记住,这是一个完整的解决方案,其中原始问题没有链接表那么详细。如果您对此有任何疑问,请随时提问,我会尽力提供帮助:

SELECT `Events_new`.`Event_ID`
FROM Events_new
GROUP BY `Events_new`.`PCBID`, `Events_new`.`EventType_ID`, `Events_new`.`DateTime`, `Events_new`.`User`
HAVING (((COUNT(`Events_new`.`PCBID`)) > 1) AND ((COUNT(`Events_new`.`User`)) > 1) AND ((COUNT(`Events_new`.`DateTime`)) > 1))

由此,我处理了每个 Event_ID,以迭代方式删除重复项。基本上我必须删除从最后一个最低表开始的所有子行,这样我就不会与外键限制发生冲突。

这段代码是在 LinqPAD 中作为 C# 语句编写的:(sbCommonFunctions 是一个内部 DLL,旨在使大多数(但不是您将看到的全部)数据库函数以相同的方式或更容易地处理)

sbCommonFunctions.Database testDB = new sbCommonFunctions.Database();
testDB.Connect("production", "database", "user", "password");
List<string> listEventIDs = new List<string>();
List<string> listEventDetailIDs = new List<string>();
List<string> listTestInformationIDs = new List<string>();
List<string> listTestStepIDs = new List<string>();
List<string> listMeasurementIDs = new List<string>();
string dtQuery = (String.Format(@"SELECT `Events_new`.`Event_ID`
FROM Events_new
GROUP BY `Events_new`.`PCBID`,
`Events_new`.`EventType_ID`,
`Events_new`.`DateTime`,
`Events_new`.`User`
HAVING (((COUNT(`Events_new`.`PCBID`)) > 1)
AND ((COUNT(`Events_new`.`User`)) > 1)
AND ((COUNT(`Events_new`.`DateTime`)) > 1))"));

int iterations = 0;
DataTable dtEventIDs = getDT(dtQuery, testDB);
while (dtEventIDs.Rows.Count > 0)
{
Console.WriteLine(dtEventIDs.Rows.Count);
Console.WriteLine(iterations);
iterations++;
foreach(DataRowView eventID in dtEventIDs.DefaultView)
{
listEventIDs.Add(eventID.Row[0].ToString());
DataTable dtEventDetails = testDB.QueryDatabase(String.Format(
"SELECT * FROM EventDetails_new WHERE Event_ID = {0}",
eventID.Row[0]));
foreach(DataRowView drvEventDetail in dtEventDetails.DefaultView)
{
listEventDetailIDs.Add(drvEventDetail.Row[0].ToString());
}
DataTable dtTestInformation = testDB.QueryDatabase(String.Format(
@"SELECT TestInformation_ID
FROM TestInformation_new
WHERE Event_ID = {0}",
eventID.Row[0]));
foreach(DataRowView drvTest in dtTestInformation.DefaultView)
{
listTestInformationIDs.Add(drvTest.Row[0].ToString());
DataTable dtTestSteps = testDB.QueryDatabase(String.Format(
@"SELECT TestSteps_ID
FROM TestSteps_new
WHERE TestInformation_TestInformation_ID = {0}",
drvTest.Row[0]));
foreach(DataRowView drvTestStep in dtTestSteps.DefaultView)
{
listTestStepIDs.Add(drvTestStep.Row[0].ToString());
DataTable dtMeasurements = testDB.QueryDatabase(String.Format(
@"SELECT Measurements_ID
FROM Measurements_new
WHERE TestSteps_TestSteps_ID = {0}",
drvTestStep.Row[0]));
foreach(DataRowView drvMeasurements in dtMeasurements.DefaultView)
{
listMeasurementIDs.Add(drvMeasurements.Row[0].ToString());
}
}
}
}
testDB.Disconnect();
string mysqlConnection =
"server=server;\ndatabase=database;\npassword=password;\nUser ID=user;";
MySqlConnection connection = new MySqlConnection(mysqlConnection);
connection.Open();
//start unwinding the duplicates from the lowest level upward
whackDuplicates(listMeasurementIDs, "measurements_new", "Measurements_ID", connection);
whackDuplicates(listTestStepIDs, "teststeps_new", "TestSteps_ID", connection);
whackDuplicates(listTestInformationIDs, "testinformation_new", "testInformation_ID", connection);
whackDuplicates(listEventDetailIDs, "eventdetails_new", "eventdetails_ID", connection);
whackDuplicates(listEventIDs, "events_new", "event_ID", connection);
connection.Close();
//update iterator from inside the clause in case there are more duplicates.
dtEventIDs = getDT(dtQuery, testDB); }

}//goofy curly brace to allow LinqPAD to deal with inline classes
public void whackDuplicates(List<string> listOfIDs,
string table,
string pkID,
MySqlConnection connection)
{
foreach(string ID in listOfIDs)
{
MySqlCommand command = connection.CreateCommand();
command.CommandText = String.Format(
"DELETE FROM " + table + " WHERE " + pkID + " = {0}", ID);
command.ExecuteNonQuery();
}
}
public DataTable getDT(string query, sbCommonFunctions.Database db)
{
return db.QueryDatabase(query);
//}/*this is deliberate, LinqPAD has a weird way of dealing with inline
classes and the last one can't have a closing curly brace (and the
first one has to have an extra opening curly brace above it, go figure)
*/

基本上这是一个巨大的 while 循环,子句迭代器从子句内部更新,直到 Event_ID 的数量下降到零(需要 5 次迭代,一些数据有多达 6 个重复项)。

关于mysql - 删除MySQL中的重复数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25797853/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com