gpt4 book ai didi

php - 删除大量数据和主索引

转载 作者:可可西里 更新时间:2023-11-01 07:44:45 26 4
gpt4 key购买 nike

我正在尝试从具有主/聚集索引的 InnoDB MySQL 表中删除大量行(> 1000 万,它大约是表中所有记录的 1/3)。字段 id 是主/聚集索引,它是连续的,没有间隙。至少应该是,我中间没有删除记录。但是有可能某些插入查询失败并且 innodb 分配了一些未使用的 id(我不确定这是不是真的)。我只删除不再需要的旧记录。表包含 varchar 列,因此行没有固定大小。

我的第一次尝试:

DELETE FROM `table` WHERE id<=10000000

它因大 io 操作而失败。似乎是 mysql 杀死了这个查询并回滚了所有更改。查询执行的大约时间为 6 小时,回滚时间大致相同。我最大的错误是事务日志大小是标准的 5mb,请注意。它必须放大。

第二次尝试按 10 000 条记录分 block 删除,例如:

DELETE FROM `table` WHERE id<=10000;
COMMIT;
DELETE FROM `table` WHERE id<=20000;
COMMIT;

等等。从一开始的查询执行时间约为 10 秒(在笔记本电脑上)。执行时间逐渐增加,执行 6 小时后,每个查询大约需要 300 秒。

第三次尝试进行平均执行时间少于 1 秒的查询。 php代码:

protected function deleteById($table, $id) {
$MinId = $this->getMinFromTable($table, 'id');
$PackDeleteCount= $this->PackDeleteCount;
$timerTotal = new Timer();
$delCountTotal = 0;
$delCountReport = 0;
$delInfo = array();
$PackMinTime = round($this->PackDeleteTime - $this->PackDeleteTime*$this->PackDeleteDiv, 3);
$PackMaxTime = round($this->PackDeleteTime + $this->PackDeleteTime*$this->PackDeleteDiv, 3);
$this->LogString(sprintf('Del `%s`, PackMinTime: %s; PackMaxTime: %s', $table, $PackMinTime, $PackMaxTime));
for (; $MinId < $id;) {
$MinId += $PackDeleteCount;
$delCountReport += $PackDeleteCount;
if ($MinId > $id) {
$MinId = $id;
}
$timer = new Timer();
$sql = sprintf('DELETE FROM `%s` WHERE id<=%s', $table, $MinId);
$this->s->Query($sql, __FILE__, __LINE__);
$delCount = $this->s->AffectedRows();
$this->s->CommitT();
$RoundTime = round($timer->end(), 3);
$delInfo[] = array(
'time' => $RoundTime,
'rows' => $PackDeleteCount,
);
$delCountTotal += $delCount;
if ($delCountReport >= $this->PackDeleteReport) {
$delCountReport = 0;
$delSqlCount = count($delInfo);
$EvTime = 0;
$PackTime = 0;
$EvCount = 0;
$PackCount = 0;
foreach ($delInfo as $v) {
$PackTime += $v['time'];
$PackCount += $v['rows'];
}
$EvTime = round($PackTime/$delSqlCount, 2);
$PackTime = round($PackTime, 2);
$EvCount = round($PackCount/$delSqlCount);
$TotalTime = $this->readableTime(intval($timerTotal->end()));
$this->LogString(sprintf('Del `%s`, Sql query count: %d; Time: %s; Count: %d; Evarage Time %s; Evarage count per delete: %d; Del total: %s; Del Total Time: %s; id <= %s', $table, $delSqlCount, $PackTime, $PackCount, $EvTime, $EvCount, $delCountTotal, $TotalTime, $MinId));
$delInfo = array();
}

$PackDeleteCountOld = $PackDeleteCount;
if ($RoundTime < $PackMinTime) {
$PackDeleteCount = intval($PackDeleteCount + $PackDeleteCount*(1 - $RoundTime/$this->PackDeleteTime));
} elseif ($RoundTime > $PackMaxTime) {
$PackDeleteCount = intval($PackDeleteCount - $PackDeleteCount*(1 - $this->PackDeleteTime/$RoundTime));
}
//$this->LogString(sprintf('Del `%s`, round time: %s; row count old: %d; row count new: %d', $table, $RoundTime, $PackDeleteCountOld, $PackDeleteCount));
}
$this->LogString(sprintf('Finished del `%s`: time: %s', $table, round($timerTotal->end(), 2)));
}

它有一些依赖性,但它们是不言自明的,可以很容易地根据标准进行更改。我将仅解释此处使用的输入变量:

$table - target table, where rows needs to be deleted
$id - all records up to this id should be deleted
$MinId - Minimal id in the target table
$this->PackDeleteCount - Initial count of records, to start from. Then it recalculates row count to be deleted each new query.
$this->PackDeleteTime - desirable query execution time in average. I used 0.5
$this->PackDeleteDiv - acceptable deviation from $this->PackDeleteTime. In percentage. I used 0.3
$this->PackDeleteReport - Each N records should print statistic information about deleting

该变体性能稳定。

性能较差的原因是数据库引擎必须对受影响的叶子中的所有记录数据进行物理排序。这是我的理解,如果您的知识更深入,欢迎您添加对实际情况的详细描述。或许会给出一些新的思路。

问题:是否可以计算叶子上的行分布并删除整个叶子甚至分支,从而使数据库引擎不必对数据求助?也许您对这种情况的性能优化有一些其他想法。

最佳答案

我遇到过几次这个问题,通常我会沿着创建一个分区(或首先创建多个分区)的路线进行,因为这将减少 INNODB 对大型删除查询所需的 IO,而不必重建整个索引树 - 然后一次删除 1000 - 1500 个 block 。

这也是练习:

  • 将 AutoCommit 设置为 1
  • 一次将删除分成大约 1,500 个
  • 确保 innodb_log_file_size 有足够大的大小

关于php - 删除大量数据和主索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23887812/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com