gpt4 book ai didi

SQL WHERE xx IN ('qq' , 'ww' ...) 性能 - 更多的值(value),更少的时间?

转载 作者:行者123 更新时间:2023-12-04 16:11:06 25 4
gpt4 key购买 nike

我终于问了我的第一个问题(虽然我是一个长期跟踪者)。

前几天在工作中,一个 SQL 查询引起了我的注意。问题在于 WHERE 的性能子句,当使用 IN 将索引与可能的值进行比较时运算符(operator)。

SELECT   SUM (parts.quantity) AS quantity,
concessions.concessionCode,
concessions.description AS concessionDesc,
parts.type,
activities.activityCode,
REPLACE (activities.activityCode, activities.lvl2 || '-', '') AS activityCodeDisplay,
strings.activityDesc,
strings.activityDesc2,
strings.activityDesc3
FROM tb_parts parts,
tb_activities activities,
tb_strings strings,
tb_concessions concessions
WHERE parts.activityCode = activities.activityCode
AND parts.concessionCode = activities.concessionCode
AND activities.concessionCode = concesions.concessionCode
AND activities.concessionCode = strings.concessionCode
AND activities.activityCode = strings.activityCode
AND strings.language = 'ENG'
--AND parts.concesionCode IN ('ZD', 'G9', 'TR', 'JS0')
AND parts.concesionCode IN ('ZD', 'G9')
AND parts.date >= TO_DATE ('01/01/2013 00:00:00', 'DD/MM/YYYY HH24:MI:SS')
AND parts.date <= TO_DATE ('30/04/2013 23:59:59', 'DD/MM/YYYY HH24:MI:SS')
AND parts.type IN ('U', 'M')
AND parts.value = 'E'
GROUP BY concesions.concessionCode,
concesions.description,
parts.type,
activities.activityCode,
REPLACE (activities.activityCode, activities.lvl2|| '-', ''),
strings.activityDesc,
strings.activityDesc2,
strings.activityDesc3
ORDER BY concesions.concessionCode;

我遇到的问题是 - 如果查询像这样运行( IN 有两个值),则需要 30 秒。如果它使用四个值运行(就像在注释行中一样),则查询需要 5 秒。我希望将索引与多个值进行比较会花费更多时间,但似乎并非如此。我在白天多次重复“测试”,它们总是或多或少相同(30 +-1s,5 +-1s)。

任何有关为什么会以这种方式行事的见解都将不胜感激!

附言我已经翻译了表/列的名称,如果有任何差异,请见谅。

P.P.S.我已经用连接重写了这段代码,它的速度要快得多,但这种异常背后的原因仍然困扰着我:)

编辑 : 终于上类了!经过一些修补,我已经能够为这两个版本创建执行计划,甚至为查询的第三个版本(使用 4 和 2 where 中的值,时间约为 600 毫秒) .另外,关于表格中的数据有几个问题,所以这里有一些信息:
All the stats are analyzed the day that queries were executed

Table parts
total rows - 3.2 M
matches for 2 values - 1.08 M (~34%)
matches for 4 values - 1.30 M (~41%)
Table activities
total rows - 3866
matches for 2 values - 321 (~ 8%)
matches for 4 values - 644 (~16%)
Table strings
total rows - 7436
matches for 2 values - 642 (~ 8%)
matches for 4 values - 1288 (~17%)

Index in_parts
codConcession
username
date

正因为如此,我认为使用动态采样时没有大的区别(除了+2/3s)(如果我做对了,也就是说,在 /*+ dynamic_sampling(tb_parts 10) */ 关键字之后使用 SELECT)

对于 两个值 :
----------------------------------------------------------------------------------------------------- 
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 186 | 864 (1)| 00:00:11 |
| 1 | SORT ORDER BY | | 1 | 186 | 864 (1)| 00:00:11 |
| 2 | HASH GROUP BY | | 1 | 186 | 864 (1)| 00:00:11 |
|* 3 | TABLE ACCESS BY INDEX ROWID | tb_parts | 1 | 37 | 818 (1)| 00:00:10 |
| 4 | NESTED LOOPS | | 1 | 186 | 862 (1)| 00:00:11 |
| 5 | NESTED LOOPS | | 1 | 149 | 44 (0)| 00:00:01 |
| 6 | NESTED LOOPS | | 34 | 2108 | 10 (0)| 00:00:01 |
| 7 | INLIST ITERATOR | | | | | |
| 8 | TABLE ACCESS BY INDEX ROWID| tb_concesions | 2 | 54 | 2 (0)| 00:00:01 |
|* 9 | INDEX UNIQUE SCAN | pk_concession | 2 | | 1 (0)| 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID | tb_activities | 17 | 595 | 4 (0)| 00:00:01 |
|* 11 | INDEX RANGE SCAN | pk_activity | 17 | | 2 (0)| 00:00:01 |
| 12 | TABLE ACCESS BY INDEX ROWID | tb_strings | 1 | 87 | 1 (0)| 00:00:01 |
|* 13 | INDEX UNIQUE SCAN | pk_string | 1 | | 0 (0)| 00:00:01 |
|* 14 | INDEX RANGE SCAN | in_parts | 454 | | 648 (1)| 00:00:08 |
-----------------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("parts"."value"='E'
AND ("parts"."type"='M' OR "parts"."type"='U')
AND "parts"."activityCode"="activities"."activityCode")

9 - access("concessions"."concessionCode"='G9'
OR "concessions"."concessionCode"='ZD')

11 - access("activities"."concessionCode"="concessions"."concessionCode")
filter("activities"."concessionCode"='G9'
OR "activities"."concessionCode"='ZD')

13 - access("activities"."concessionCode"="strings"."concessionCode"
AND "activities"."activityCode"="strings"."activityCode"
AND "strings"."language"='ENG')
filter("strings"."concessionCode"='G9'
OR "strings"."concessionCode"='ZD')

14 - access("parts"."concessionCode"="activities"."concessionCode"
AND "parts"."date">=TO_DATE('2013-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss')
AND "parts"."date"<=TO_DATE(' 2013-04-30 23:59:59',
'syyyy-mm-dd hh24:mi:ss'))
filter("parts"."date">=TO_DATE('2013-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss')
AND ("parts"."concessionCode"='G9'
OR "parts"."concessionCode"='ZD')
AND "parts"."date"<=TO_DATE(' 2013-04-30 23:59:59',
'syyyy-mm-dd hh24:mi:ss'))

对于 四个值 :
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 186 | 7412 (2)| 00:01:29 |
| 1 | SORT ORDER BY | | 1 | 186 | 7412 (2)| 00:01:29 |
| 2 | HASH GROUP BY | | 1 | 186 | 7412 (2)| 00:01:29 |
| 3 | NESTED LOOPS | | 1 | 186 | 7410 (2)| 00:01:29 |
|* 4 | HASH JOIN | | 17 | 1683 | 7393 (2)| 00:01:29 |
|* 5 | HASH JOIN | | 136 | 8432 | 21 (5)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID| tb_concesions | 4 | 108 | 2 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | pk_concession | 4 | | 1 (0)| 00:00:01 |
|* 9 | TABLE ACCESS FULL | tb_activities | 644 | 22540 | 18 (0)| 00:00:01 |
|* 10 | TABLE ACCESS FULL | tb_parts | 4310 | 155K| 7372 (2)| 00:01:29 |
| 11 | TABLE ACCESS BY INDEX ROWID | tb_strings | 1 | 87 | 1 (0)| 00:00:01 |
|* 12 | INDEX UNIQUE SCAN | pk_string | 1 | | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
---------------------------------------------------

4 - access("parts"."activityCode"="activities"."activityCode"
AND "parts"."concessionCode"="activities"."concessionCode")

5 - access("activities"."concessionCode"="concessions"."concessionCode")

8 - access("concessions"."concessionCode"='G9'
OR "concessions"."concessionCode"='JS0'
OR "concessions"."concessionCode"='TR'
OR "concessions"."concessionCode"='ZD')

9 - filter("activities"."concessionCode"='G9'
OR "activities"."concessionCode"='JS0'
OR "activities"."concessionCode"='TR'
OR "activities"."concessionCode"='ZD')
10 - filter("parts"."date">=TO_DATE(' 2013-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss')
AND "parts"."value"='E'
AND ("parts"."type"='M' OR "parts"."type"='U')
AND ("parts"."concessionCode"='G9'
OR "parts"."concessionCode"='JS0'
OR "parts"."concessionCode"='TR'
OR "parts"."concessionCode"='ZD')
AND "parts"."date"<=TO_DATE(' 2013-04-30 23:59:59',
'syyyy-mm-dd hh24:mi:ss'))

12 - access("activities"."concessionCode"="strings"."concessionCode"
AND "activities"."activityCode"="strings"."activityCode"
AND "strings"."language"='ENG')
filter("strings"."concessionCode"='G9'
OR "strings"."concessionCode"='JS0'
OR "strings"."concessionCode"='TR'
OR "strings"."concessionCode"='ZD')

最后 六个值 :
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 186 | 4525 (1)| 00:00:55 |
| 1 | SORT ORDER BY | | 1 | 186 | 4525 (1)| 00:00:55 |
| 2 | HASH GROUP BY | | 1 | 186 | 4525 (1)| 00:00:55 |
| 3 | NESTED LOOPS | | 1 | 186 | 4523 (1)| 00:00:55 |
|* 4 | HASH JOIN | | 9 | 891 | 4514 (1)| 00:00:55 |
|* 5 | HASH JOIN | | 136 | 8432 | 21 (5)| 00:00:01 |
| 6 | INLIST ITERATOR | | | | | |
| 7 | TABLE ACCESS BY INDEX ROWID| tb_concesions | 4 | 108 | 2 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | pk_concession | 4 | | 1 (0)| 00:00:01 |
|* 9 | TABLE ACCESS FULL | tb_activities | 644 | 22540 | 18 (0)| 00:00:01 |
| 10 | INLIST ITERATOR | | | | | |
|* 11 | TABLE ACCESS BY INDEX ROWID | tb_parts | 2155 | 79735 | 4493 (1)| 00:00:54 |
|* 12 | INDEX RANGE SCAN | in_parts | 8620 | | 1277 (1)| 00:00:16 |
| 13 | TABLE ACCESS BY INDEX ROWID | tb_strings | 1 | 87 | 1 (0)| 00:00:01 |
|* 14 | INDEX UNIQUE SCAN | pk_string | 1 | | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("parts"."activityCode"="activities"."activityCode"
AND "parts"."concessionCode"="activities"."concessionCode")

5 - access("activities"."concessionCode"="concessions"."concessionCode")

8 - access("concessions"."concessionCode"='G9'
OR "concessions"."concessionCode"='JS0'
OR "concessions"."concessionCode"='TR'
OR "concessions"."concessionCode"='ZD')

9 - filter("activities"."concessionCode"='G9'
OR "activities"."concessionCode"='JS0'
OR "activities"."concessionCode"='TR'
OR "activities"."concessionCode"='ZD')

11 - filter("parts"."value"='E'
AND ("parts"."type"='M' OR "parts"."type"='U'))

12 - access(("parts"."concessionCode"='G9'
OR "parts"."concessionCode"='ZD')
AND "parts"."date">=TO_DATE(' 2013-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss')
AND "parts"."date"<=TO_DATE(' 2013-04-30 23:59:59',
'syyyy-mm-dd hh24:mi:ss'))
filter("parts"."date">=TO_DATE(' 2013-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss')
AND "parts"."date"<=TO_DATE(' 2013-04-30 23:59:59',
'syyyy-mm-dd hh24:mi:ss'))

14 - access("activities"."concessionCode"="strings"."concessionCode"
AND "activities"."activityCode"="strings"."activityCode"
AND "strings"."language"='ENG')
filter("strings"."concessionCode"='G9'
OR "strings"."concessionCode"='JS0'
OR "strings"."concessionCode"='TR'
OR "strings"."concessionCode"='ZD')

由于这是我第一次与执行计划会面,我只能猜测延迟的原因是什么。
在 4 到 6 个值之间,我猜这是从 FULL ACCESS 到 ACCESS BY INDEX 的变化。此外,在访问表时,四个值 (id 10) 的过滤器包含所有四个让步值;而对于六个值,两个特许值在访问部分,过滤器只包含日期、类型和值。

最佳答案

一般来说,出现这种异常的原因是查询优化器无法准确预测成本。准确了解成本的唯一方法是使用不同的执行计划实际运行语句多次。相反,它使用统计数据来估算成本,有时估算是错误的。

当您比较“有两个值”和“有四个值”的执行计划时,您可以看到后者产生更高的成本,并且计划完全不同。优化器可以在这两个执行计划之间进行选择,并且一定认为第一个有两个值更好,而第二个有四个值更好。然而,实际上在这两种情况下,第二种都更好。

如果您仔分割析此类异常,您通常会得出一些见解,例如某些值的组合在您的数据中被高估或低估。在统计数据中使用直方图可为优化器提供更多线索,它可以更好地处理“倾斜数据”,但其预测能力仍将受到限制。

在实践中,解决方案正是您所做的:重写 SQL,直到获得可接受的性能。通常,“提示”(在 Oracle 中)也可以为优化器提供更多线索。

关于SQL WHERE xx IN ('qq' , 'ww' ...) 性能 - 更多的值(value),更少的时间?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16120485/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com