gpt4 book ai didi

mysql - JOIN 查询太慢了。不会用INDEX?

转载 作者:搜寻专家 更新时间:2023-10-30 23:21:41 27 4
gpt4 key购买 nike

我有一个过渡表,我在查询和销毁它之前临时填充了一些值。

CREATE TABLE SearchListA(
`pTime` int unsigned NOT NULL ,
`STD` double unsigned NOT NULL,
`STD_Pos` int unsigned NOT NULL,
`SearchEnd` int unsigned NOT NULL,
UNIQUE INDEX (`pTime`,`STD` ASC) USING BTREE
) ENGINE = MEMORY;

看起来是这样的:

+------------+------------+---------+------------+
| pTime | STD | STD_Pos | SearchEnd |
+------------+------------+---------+------------+
| 1105715400 | 1.58474499 | 0 | 1105723200 |
| 1106297700 | 2.5997839 | 0 | 1106544000 |
| 1107440400 | 2.04860375 | 0 | 1107440700 |
| 1107440700 | 1.58864998 | 0 | 1107467400 |
| 1107467400 | 1.55207218 | 0 | 1107790500 |
| 1107790500 | 2.04239417 | 0 | 1108022100 |
| 1108022100 | 1.61385678 | 0 | 1108128000 |
| 1108771500 | 1.58835083 | 0 | 1108771800 |
| 1108771800 | 1.65734727 | 0 | 1108772100 |
| 1108772100 | 2.09378189 | 0 | 1109027700 |
+------------+------------+---------+------------+

只有列 pTimeSearchEnd 与我的问题相关。

我的目的是使用此表来加快在更大的静态表中的搜索速度。

The first column, pTime, is where the search should start
The fourth column, SearchEnd, is where the search should end

大表类似;它看起来像这样:

CREATE TABLE `b50d1_abs` (
`pTime` int(10) unsigned NOT NULL,
`Slope` double NOT NULL,
`STD` double NOT NULL,
`Slope_Pos` int(11) NOT NULL,
`STD_Pos` int(11) NOT NULL,
PRIMARY KEY (`pTime`),
KEY `Slope` (`Slope`) USING BTREE,
KEY `STD` (`STD`),
KEY `ID1` (`pTime`,`STD`) USING BTREE
) ENGINE=MyISAM DEFAULT CHARSET=latin1 MIN_ROWS=339331 MAX_ROWS=539331 PACK_KEYS=1 ROW_FORMAT=FIXED;

+------------+-------------+------------+-----------+---------+
| pTime | Slope | STD | Slope_Pos | STD_Pos |
+------------+-------------+------------+-----------+---------+
| 1107309300 | 1.63257919 | 1.39241698 | 0 | 1 |
| 1107314400 | 6.8959276 | 0.22425643 | 1 | 1 |
| 1107323100 | 18.19909502 | 1.46854808 | 1 | 0 |
| 1107335400 | 2.50135747 | 0.4736305 | 0 | 0 |
| 1107362100 | 4.28778281 | 0.85576985 | 0 | 1 |
| 1107363300 | 6.96289593 | 1.41299044 | 0 | 0 |
| 1107363900 | 8.10316742 | 0.2859726 | 0 | 0 |
| 1107367500 | 16.62443439 | 0.61587645 | 0 | 0 |
| 1107368400 | 19.37918552 | 1.18746968 | 0 | 0 |
| 1107369300 | 21.94570136 | 0.94261744 | 0 | 0 |
| 1107371400 | 25.85701357 | 0.2741292 | 0 | 1 |
| 1107375300 | 21.98914027 | 1.59521158 | 0 | 1 |
| 1107375600 | 20.80542986 | 1.59231289 | 0 | 1 |
| 1107375900 | 19.62714932 | 1.50661679 | 0 | 1 |
| 1107381900 | 8.23167421 | 0.98048205 | 1 | 1 |
| 1107383400 | 10.68778281 | 1.41607579 | 1 | 0 |
+------------+-------------+------------+-----------+---------+
...etc (439340 rows)

此处,pTimeSTDSTD_Pos 列与我的问题相关。

对于较小表 (SearchListA) 中的每个元素,我需要在较大表 (b50d1_abs()) 中搜索指定范围并返回包含高于当前 SearchListA.pTime 的最低 b50d1_abs.pTime 并且也符合以下条件:

SearchListA.STD < b50d1_abs.STD AND SearchListA.STD_Pos <> b50d1_abs.STD_Pos

b50d1_abs.pTime < SearchListA.SearchEnd

后一个条件只是为了减少搜索的长度。

在我看来,这是一个非常简单的查询,应该能够使用索引;特别是因为所有值都是无符号数字 - 但我无法让它执行得足够快!我认为这是因为它每次都会重建整个表,而不是仅仅从中省略值。

如果有人看一下我的代码并想出一个更有效的方法来解决这个问题,我将非常感激:

SELECT   
m.pTime as OpenTime,
m.STD,
m.STD_Pos,
mu.pTime AS CloseTime
FROM
SearchListA m
JOIN b50d1_abs mu ON mu.pTime =(
SELECT
md.pTime
FROM
b50d1_abs as md
WHERE
md.pTime > m.pTime
AND md.pTime <=m.SearchEnd
AND m.STD < md.STD AND m.STD_Pos <> md.STD_Pos

LIMIT 1
);

这是我的EXPLAIN EXTENDED声明:

+----+--------------------+-------+--------+-----------------+---------+---------+------+--------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+--------+-----------------+---------+---------+------+--------+----------+--------------------------+
| 1 | PRIMARY | m | ALL | NULL | NULL | NULL | NULL | 365 | 100.00 | |
| 1 | PRIMARY | mu | eq_ref | PRIMARY,ID1 | PRIMARY | 4 | func | 1 | 100.00 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | md | ALL | PRIMARY,STD,ID1 | NULL | NULL | NULL | 439340 | 100.00 | Using where |
+----+--------------------+-------+--------+-----------------+---------+---------+------+--------+----------+--------------------------+

看起来最长的查询 (#2) 根本没有使用索引!

如果我尝试 FORCE INDEX 然后它会在 possible_keys 下列出它,但仍然在 Key 下列出 NULL 并且仍然需要很长时间(超过 80 秒)。

我需要在 10 秒内完成此查询;甚至 10 也太长了。

最佳答案

你的子查询是一个依赖子查询,所以最好的情况是它会为表 m 中的每一行计算一次。因为 m 包含几行,所以没问题。

但是如果您将该子查询置于 JOIN 条件中,它将被执行 (rows in m)*(rows in mu) 次,无论如何。

请注意,您的结果可能不正确,因为:

return the row with the lowest b50d1_abs.pTime

但您没有在任何地方指定。

试试这个查询:

SELECT   
m.pTime as OpenTime,
m.STD,
m.STD_Pos,
(
SELECT min( big.pTime )
FROM b50d1_abs as big
WHERE big.pTime > m.pTime
AND big.pTime <= m.SearchEnd
AND m.STD < big.STD AND m.STD_Pos <> big.STD_Pos
) AS CloseTime
FROM SearchListA m

或者这个:

SELECT   
m.pTime as OpenTime,
m.STD,
m.STD_Pos,
min( big.pTime )
FROM
SearchListA m
JOIN b50d1_abs as big ON (
big.pTime > m.pTime
AND big.pTime <= m.SearchEnd
AND m.STD < big.STD AND m.STD_Pos <> big.STD_Pos
)
GROUP BY m.pTime

(如果您还想要搜索不成功的行,请将其设为 LEFT JOIN)。

SELECT   
m.pTime as OpenTime,
m.STD,
m.STD_Pos,
(
SELECT big.pTime
FROM b50d1_abs as big
WHERE big.pTime > m.pTime
AND big.pTime <= m.SearchEnd
AND m.STD < big.STD AND m.STD_Pos <> big.STD_Pos
ORDER BY big.pTime LIMIT 1
) AS CloseTime
FROM SearchListA m

(尝试在 b50d1_abs(pTime、STD、STD_Pos)上建立索引

仅供引用,这里有一些在测试数据集上使用 Postgres 的测试,应该看起来像你的(可能是远程的,哈哈)

CREATE TABLE small ( 
pTime INT PRIMARY KEY,
STD FLOAT NOT NULL,
STD_POS BOOL NOT NULL,
SearchEnd INT NOT NULL
);

CREATE TABLE big(
pTime INTEGER PRIMARY KEY,
Slope FLOAT NOT NULL,
STD FLOAT NOT NULL,
Slope_Pos BOOL NOT NULL,
STD_POS BOOL NOT NULL
);

INSERT INTO small SELECT
n*100000,
random(),
random()<0.1,
n*100000+random()*50000
FROM generate_series( 1, 365 ) n;

INSERT INTO big SELECT
n*100,
random(),
random(),
random() > 0.5,
random() > 0.5
FROM generate_series( 1, 500000 ) n;

Query 1 : 6.90 ms (yes milliseconds)
Query 2 : 48.20 ms
Query 3 : 6.46 ms

关于mysql - JOIN 查询太慢了。不会用INDEX?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5928086/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com