gpt4 book ai didi

mysql - 跨层次数据优化 MySQL 查询

转载 作者:IT老高 更新时间:2023-10-29 00:12:40 26 4
gpt4 key购买 nike

我有一个相当稳定的有向图,其阶数约为 100k 个顶点,大小约为 1k 条边。它是二维的,因为它的顶点可以由一对整数 (x, y)(基数 ~100 x ~1000)标识,并且所有边都在 x< 中严格递增.

还有一个字典,包含与每个顶点关联的 ~1k (key, val) 对。

我目前将图表存储在 MySQL 数据库中的三个 (InnoDB) 表中:一个顶点表(我认为这与我的问题无关,所以我省略了包括它和外键约束在我下面的摘录中提到它);一张放字典的 table ;以及 Bill Karwin Eloquent 地描述的连接顶点的“闭包表”。

顶点字典表定义如下:

CREATE TABLE `VertexDictionary` (
`x` smallint(6) unsigned NOT NULL,
`y` smallint(6) unsigned NOT NULL,
`key` varchar(50) NOT NULL DEFAULT '',
`val` smallint(1) DEFAULT NULL,
PRIMARY KEY (`x`, `y` , `key`),
KEY `dict` (`x`, `key`, `val`)
);

连接顶点的闭包表为:

CREATE TABLE `ConnectedVertices` (
`tail_x` smallint(6) unsigned NOT NULL,
`tail_y` smallint(6) unsigned NOT NULL,
`head_x` smallint(6) unsigned NOT NULL,
`head_y` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`tail_x`, `tail_y`, `head_x`),
KEY `reverse` (`head_x`, `head_y`, `tail_x`),
KEY `fx` (`tail_x`, `head_x`),
KEY `rx` (`head_x`, `tail_x`)
);

还有一个 (x, key) 对的字典,这样对于每个这样的对,所有用 x 标识的顶点在它们的字典中都有一个值。这个字典存储在第四个表中:

CREATE TABLE `SpecialKeys` (
`x` smallint(6) unsigned NOT NULL,
`key` varchar(50) NOT NULL DEFAULT '',
PRIMARY KEY (`x`),
KEY `xkey` (`x`, `key`)
);

我经常希望提取具有特定 x=X 的所有顶点的字典中使用的键集,以及连接到的任何 SpecialKeys 的关联值左边:

SELECT DISTINCT
`v`.`key`,
`u`.`val`
FROM
`ConnectedVertices` AS `c`
JOIN `VertexDictionary` AS `u` ON (`u`.`x`, `u`.`y` ) = (`c`.`tail_x`, `c`.`tail_y`)
JOIN `VertexDictionary` AS `v` ON (`v`.`x`, `v`.`y` ) = (`c`.`head_x`, `c`.`head_y`)
JOIN `SpecialKeys` AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
WHERE
`v`.`x` = X
;

EXPLAIN 的输出是:

id   select_type   table   type     possible_keys           key       key_len   ref                                rows   Extra 1   SIMPLE        k       index    PRIMARY,xkey            xkey          154   NULL                                 40   Using index; Using temporary 1   SIMPLE        c       ref      PRIMARY,reverse,fx,rx   PRIMARY         2   db.k.x                                1   Using where 1   SIMPLE        v       ref      PRIMARY,dict            PRIMARY         4   const,db.c.head_y                   136   Using index 1   SIMPLE        u       eq_ref   PRIMARY,dict            PRIMARY       156   db.c.tail_x,db.c.tail_y,db.k.key      1   Using where

但是这个查询需要大约 10 秒才能完成。一直在用头撞砖墙试图改善问题,但无济于事。

是否可以改进查询,或者我应该考虑不同的数据结构?非常感谢您的想法!


更新

尽管我确实重建了表并发现 EXPLAIN 输出略有不同(如上所示,从 v 获取的行数,但我仍然一无所获 从 1 增加到 136!);查询仍然需要大约 10 秒才能执行。

我真的不明白这是怎么回事。获取所有 (x, y, SpecialValue) 和所有 (x, y, key) 元组的查询都非常快(分别为 ~30ms 和 ~150ms),但本质上加入两者所花费的时间是它们加起来的时间的 50 多倍...我怎样才能缩短执行该加入所花费的时间?

下面 SHOW VARIABLES LIKE '%innodb%'; 的输出:

Variable_name                    Value------------------------------------------------------------have_innodb                      YESignore_builtin_innodb            ONinnodb_adaptive_flushing         ONinnodb_adaptive_hash_index       ONinnodb_additional_mem_pool_size  2097152innodb_autoextend_increment      8innodb_autoinc_lock_mode         1innodb_buffer_pool_size          1179648000innodb_change_buffering          insertsinnodb_checksums                 ONinnodb_commit_concurrency        0innodb_concurrency_tickets       500innodb_data_file_path            ibdata1:10M:autoextendinnodb_data_home_dir             /rdsdbdata/db/innodbinnodb_doublewrite               ONinnodb_fast_shutdown             1innodb_file_format               Antelopeinnodb_file_format_check         Barracudainnodb_file_per_table            ONinnodb_flush_log_at_trx_commit   1innodb_flush_method              O_DIRECTinnodb_force_recovery            0innodb_io_capacity               200innodb_lock_wait_timeout         50innodb_locks_unsafe_for_binlog   OFFinnodb_log_buffer_size           8388608innodb_log_file_size             134217728innodb_log_files_in_group        2innodb_log_group_home_dir        /rdsdbdata/log/innodbinnodb_max_dirty_pages_pct       75innodb_max_purge_lag             0innodb_mirrored_log_groups       1innodb_old_blocks_pct            37innodb_old_blocks_time           0innodb_open_files                300innodb_read_ahead_threshold      56innodb_read_io_threads           4innodb_replication_delay         0innodb_rollback_on_timeout       OFFinnodb_spin_wait_delay           6innodb_stats_method              nulls_equalinnodb_stats_on_metadata         ONinnodb_stats_sample_pages        8innodb_strict_mode               OFFinnodb_support_xa                ONinnodb_sync_spin_loops           30innodb_table_locks               ONinnodb_thread_concurrency        0innodb_thread_sleep_delay        10000innodb_use_sys_malloc            ONinnodb_version                   1.0.16innodb_write_io_threads          4

最佳答案

没有花时间测试它,你提供了一个不完整的例子?您绝对应该尝试重新排序连接表。解释输出提供了一些信息,假设通过 key_len 排序应该是启发式最快的。我相信,要过滤的第一个表应该列在最后,以防优化器无法解决这个问题。

所以,假设 'c, v, k, u' 顺序是最好的。

SELECT DISTINCT
`v`.`key`,
`u`.`val`
FROM
`VertexDictionary` AS `u`
JOIN `SpecialKeys` AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
JOIN `VertexDictionary` AS `v`
JOIN `ConnectedVertices` AS `c` ON (`u`.`x`, `u`.`y` ) = (`c`.`tail_x`, `c`.`tail_y`)
AND (`v`.`x`, `v`.`y` ) = (`c`.`head_x`, `c`.`head_y`)
WHERE
`v`.`x` = X
;

'rows' 会建议 'c/u, k, v' 顺序,但这取决于数据:

SELECT DISTINCT
`v`.`key`,
`u`.`val`
FROM
`VertexDictionary` AS `u`
JOIN `VertexDictionary` AS `v`
JOIN `SpecialKeys` AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
JOIN `ConnectedVertices` AS `c` ON (`u`.`x`, `u`.`y` ) = (`c`.`tail_x`, `c`.`tail_y`)
AND (`v`.`x`, `v`.`y` ) = (`c`.`head_x`, `c`.`head_y`)
WHERE
`v`.`x` = X
;

希望这会有所帮助。

UPDATE(避免 varchar 连接):

SELECT DISTINCT
`v`.`key`,
`u`.`val`
FROM
`ConnectedVertices` AS `c`
JOIN `VertexDictionary` AS `u` ON (`u`.`x`, `u`.`y` ) = (`c`.`tail_x`, `c`.`tail_y`)
JOIN `VertexDictionary` AS `v` ON (`v`.`x`, `v`.`y` ) = (`c`.`head_x`, `c`.`head_y`)
WHERE
(`u`.`x`, `u`.`key`) IN (SELECT `k`.`x`, `k`.`key` FROM `SpecialKeys` AS `k`)
AND
`v`.`x` = X
;

关于mysql - 跨层次数据优化 MySQL 查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10211029/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com