gpt4 book ai didi

mysql - 从大型 mysql 数据库中的另一个表更新列(700 万行)

转载 作者:行者123 更新时间:2023-11-29 01:31:21 25 4
gpt4 key购买 nike

描述

我有 2 个具有以下结构的表(删除了不相关的列):

mysql> explain parts;
+-------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| code | varchar(32) | NO | PRI | NULL | |
| slug | varchar(255) | YES | | NULL | |
| title | varchar(64) | YES | | NULL | |
+-------------+--------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

mysql> explain details;
+-------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| sku | varchar(32) | NO | PRI | NULL | |
| description | varchar(700) | YES | | NULL | |
| part_code | varchar(32) | NO | PRI | | |
+-------------------+--------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

parts 包含 184147 行,details 包含 7278870 行。details 中的 part_code 列表示 parts 表中的 code 列。由于这些列是 varchar,我想将列 id int(11) 添加到 partspart_id int(11) 详细信息。我试过这个:

mysql> alter table parts drop primary key;
Query OK, 184147 rows affected (0.66 sec)
Records: 184147 Duplicates: 0 Warnings: 0

mysql> alter table parts add column
id int(11) not null auto_increment primary key first;
Query OK, 184147 rows affected (0.55 sec)
Records: 184147 Duplicates: 0 Warnings: 0

mysql> select id, code from parts limit 5;
+----+-------------------------+
| id | code |
+----+-------------------------+
| 1 | Yhk0KqSMeLcfH1KEfykihQ2 |
| 2 | IMl4iweZdmrBGvSUCtMCJA2 |
| 3 | rAKZUDj1WOnbkX_8S8mNbw2 |
| 4 | rV09rJ3X33-MPiNRcPTAwA2 |
| 5 | LPyIa_M_TOZ8655u1Ls5mA2 |
+----+-------------------------+
5 rows in set (0.00 sec)

所以现在我在 parts 表中有了包含正确数据的 id 列。将 part_id 列添加到 details 表后:

mysql> alter table details add column part_id int(11) not null after part_code;
Query OK, 7278870 rows affected (1 min 17.74 sec)
Records: 7278870 Duplicates: 0 Warnings: 0

现在最大的问题是如何相应地更新part_id?以下查询:

mysql> update details d
join parts p on d.part_code = p.code
set d.part_id = p.id;

在我杀死它之前运行了大约 30 个小时。

注意这两个表都是MyISAM:

mysql> select engine from information_schema.tables where table_schema = 'db_name' and (table_name = 'parts' or table_name = 'details');
+--------+
| ENGINE |
+--------+
| MyISAM |
| MyISAM |
+--------+
2 rows in set (0.01 sec)

我现在才意识到,问题之一是在 parts 表上删除键,我在 code 列上删除了索引。另一方面,我在 details 表上有以下索引(省略了一些不相关的列):

mysql> show indexes from details;
+---------+------------+----------+--------------+-------------+-----------+-------------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Index_type |
+---------+------------+----------+--------------+-------------+-----------+-------------+------------+
| details | 0 | PRIMARY | 1 | sku | A | NULL | BTREE |
| details | 0 | PRIMARY | 3 | part_code | A | 7278870 | BTREE |
+---------+------------+----------+--------------+-------------+-----------+-------------+------------+
2 rows in set (0.00 sec)

我的问题是:

  1. 更新查询是否正常或可以以某种方式优化?
  2. 我将在 parts 表的 code 列上添加索引,查询会在合理的时间内运行,还是会再次运行几天?
  3. 如何制作 (sql/bash/php) 脚本以便查看查询执行的进度?

非常感谢!

最佳答案

正如我在问题中提到的,我忘记了 parts 表上删除的索引,所以我添加了它们:

alter table parts add key code (code);

受 Puggan Se 的回答启发,我尝试在 PHP 脚本中的 UPDATE 上使用 LIMIT,但 LIMIT 无法使用在 MySQL 中使用 UPDATEJOIN。为了限制查询,我在 details 表中添加了一个新列:

# drop the primary key,
alter table details drop primary key;
# so I can create an auto_increment column
alter table details add id int not null auto_increment primary key;
# alter the id column and remove the auto_increment
alter table details change id id int not null;
# drop again the primary key
alter table details drop primary key;
# add new indexes
alter table details add primary key ( id, sku, num, part_code );

现在我可以使用“限制”了:

update details d
join parts p on d.part_code = p.code
set d.part_id = p.id
where d.id between 1 and 5000;

下面是完整的 PHP 脚本:

$started = time();
$i = 0;
$total = 7278870;

echo "Started at " . date('H:i:s', $started) . PHP_EOL;

function timef($s){
$h = round($s / 3600);
$h = str_pad($h, 2, '0', STR_PAD_LEFT);
$s = $s % 3600;
$m = round( $s / 60);
$m = str_pad($m, 2, '0', STR_PAD_LEFT);
$s = $s % 60;
$s = str_pad($s, 2, '0', STR_PAD_LEFT);
return "$h:$m:$s";
}

while (1){
$i++;
$j = $i * 5000;
$k = $j + 4999;
$result = mysql_query("
update details d
join parts p on d.part_code = p.code
set d.part_id = p.id
where d.id between $j and $k
");
if(!$result) die(mysql_error());
if(mysql_affected_rows() == 0) die(PHP_EOL . 'Done!');
$p = round(($i * 5000) / $total, 4) * 100;
$s = time() - $started;
$ela = timef($s);
$eta = timef( (( $s / $p ) * 100) - $s );
$eq = floor($p/10);
$show_gt = ($p == 100);
$spaces = $show_gt ? 9 - $eq : 10 - $eq;
echo "\r {$p}% | [" . str_repeat('=', $eq) . ( $show_gt ? '' : '>' ) . str_repeat(' ', $spaces) . "] | Elapsed: ${ela} | ETA: ${eta}";
}

这是一个截图:

working script screenshot

如您所见,整个过程不到 5 分钟 :)谢谢大家!

P.S.:仍然存在一个错误,因为我后来发现还有 4999 行 part_id = 0,但我已经手动完成了。

关于mysql - 从大型 mysql 数据库中的另一个表更新列(700 万行),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11430362/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com