gpt4 book ai didi

sql - 添加 where 子句会使查询速度变慢

转载 作者:行者123 更新时间:2023-12-04 13:33:03 27 4
gpt4 key购买 nike

我的 Oracle SQL 查询如下-
Q1-

select 
hca.account_number,
hca.attribute3,
SUM(rcl.extended_amount)
from
ra_customer_trx_all rct
JOIN ra_customer_trx_lines_all rcl
ON rct.customer_trx_id = rcl.customer_trx_id
JOIN HZ_CUST_ACCOUNTS_ALL hca
ON rct.bill_to_customer_id = hca.cust_account_id
where
rct.trx_date >= TO_DATE('01-MAR-2020', 'DD-MON-YYYY')
AND
rct.trx_date < TO_DATE('03-MAR-2020', 'DD-MON-YYYY')
AND
rcl.line_type = 'LINE'
AND
rct.org_id = 3523
AND hca.account_number = '1063431'
group by
hca.account_number,
hca.attribute3;
看起来很简单,但我似乎无法理解为什么上面的查询运行 拍品 与完全相同的查询版本相比更慢,除了我注释掉最后一个 where 语句 -
Q2 -
select 
hca.account_number,
hca.attribute3,
SUM(rcl.extended_amount)
from
ra_customer_trx_all rct
JOIN ra_customer_trx_lines_all rcl
ON rct.customer_trx_id = rcl.customer_trx_id
JOIN HZ_CUST_ACCOUNTS_ALL hca
ON rct.bill_to_customer_id = hca.cust_account_id
where
rct.trx_date >= TO_DATE('01-MAR-2020', 'DD-MON-YYYY')
AND
rct.trx_date < TO_DATE('03-MAR-2020', 'DD-MON-YYYY')
AND
rcl.line_type = 'LINE'
AND
rct.org_id = 3523
-- AND hca.account_number = '1063431' ***THIS IS NOW COMMENTED***
group by
hca.account_number,
hca.attribute3;
Q1大约需要 20 分钟 而 Q2 运行于 1 秒 .
我不太擅长查询调优,但我在 account_number 上有一个索引,确保没有隐式数据类型转换。解释计划告诉我 where 子句将使用索引唯一扫描,我认为这是正确的选项。我会假设添加 where 子句应该会使其更快,因为您正在缩小要分组的行数。但似乎我的假设是不正确的,因为我观察到的与我的假设完全相反。
解释下面给出的计划 -
Q1(带Where子句)
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 245 (100)| |
| 1 | HASH GROUP BY | | 1 | 53 | 245 (1)| 00:00:01 |
| 2 | NESTED LOOPS | | | | | |
| 3 | NESTED LOOPS | | 1 | 53 | 245 (1)| 00:00:01 |
| 4 | NESTED LOOPS | | 1 | 38 | 240 (1)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| HZ_CUST_ACCOUNTS | 1 | 15 | 2 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | HZ_CUST_ACCOUNTS_U2 | 1 | | 1 (0)| 00:00:01 |
|* 7 | TABLE ACCESS BY INDEX ROWID| RA_CUSTOMER_TRX_ALL | 1 | 23 | 238 (1)| 00:00:01 |
|* 8 | INDEX RANGE SCAN | XXCCC_RA_CUSTOMER_TRX_ALL_N97 | 1353 | | 8 (0)| 00:00:01 |
|* 9 | INDEX RANGE SCAN | RA_CUSTOMER_TRX_LINES_N2 | 2 | | 3 (0)| 00:00:01 |
|* 10 | TABLE ACCESS BY INDEX ROWID | RA_CUSTOMER_TRX_LINES_ALL | 1 | 15 | 5 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

6 - access("HCA"."ACCOUNT_NUMBER"='1063431')
7 - filter(("RCT"."TRX_DATE">=TO_DATE(' 2020-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"RCT"."ORG_ID"=3523 AND "RCT"."TRX_DATE"<TO_DATE(' 2020-03-03 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
8 - access("RCT"."BILL_TO_CUSTOMER_ID"="HCA"."CUST_ACCOUNT_ID")
9 - access("RCT"."CUSTOMER_TRX_ID"="RCL"."CUSTOMER_TRX_ID")
10 - filter("RCL"."LINE_TYPE"='LINE')

Q2(无Where子句)
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 137K(100)| |
| 1 | HASH GROUP BY | | 43 | 1892 | | 137K (1)| 00:00:11 |
| 2 | NESTED LOOPS | | | | | | |
| 3 | NESTED LOOPS | | 34875 | 1498K| | 137K (1)| 00:00:11 |
| 4 | VIEW | VW_GBF_9 | 29570 | 837K| | 4665 (1)| 00:00:01 |
| 5 | HASH GROUP BY | | 29570 | 1097K| 1400K| 4665 (1)| 00:00:01 |
|* 6 | HASH JOIN | | 29570 | 1097K| | 4369 (1)| 00:00:01 |
|* 7 | TABLE ACCESS BY INDEX ROWID| RA_CUSTOMER_TRX_ALL | 29570 | 664K| | 3904 (1)| 00:00:01 |
|* 8 | INDEX SKIP SCAN | RA_CUSTOMER_TRX_N17 | 33634 | | | 189 (1)| 00:00:01 |
| 9 | TABLE ACCESS FULL | HZ_CUST_ACCOUNTS | 43897 | 643K| | 464 (1)| 00:00:01 |
|* 10 | INDEX RANGE SCAN | RA_CUSTOMER_TRX_LINES_N2 | 2 | | | 3 (0)| 00:00:01 |
|* 11 | TABLE ACCESS BY INDEX ROWID | RA_CUSTOMER_TRX_LINES_ALL | 1 | 15 | | 5 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

6 - access("RCT"."BILL_TO_CUSTOMER_ID"="HCA"."CUST_ACCOUNT_ID")
7 - filter("RCT"."ORG_ID"=3523)
8 - access("RCT"."TRX_DATE">=TO_DATE(' 2020-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"RCT"."TRX_DATE"<TO_DATE(' 2020-03-03 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
filter(("RCT"."TRX_DATE">=TO_DATE(' 2020-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"RCT"."TRX_DATE"<TO_DATE(' 2020-03-03 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
10 - access("ITEM_1"="RCL"."CUSTOMER_TRX_ID")
11 - filter("RCL"."LINE_TYPE"='LINE')

编辑:对不起,在下面添加了执行计划的图像。
Q1
Q2

最佳答案

首先,来自explain plan我们可以看到表统计信息不是实际的。请收集表统计信息。其次,你需要展示带有执行统计信息的真实执行计划,而不仅仅是explain plan .
这里的主要问题是 CBO 在第一个计划的第 9 行得到错误的基数:

|*  8 |       INDEX RANGE SCAN          | XXCCC_RA_CUSTOMER_TRX_ALL_N97 |  1353 |       |     8   (0)| 00:00:01 |

8 - access("RCT"."BILL_TO_CUSTOMER_ID"="HCA"."CUST_ACCOUNT_ID")
如您所见,它通过 BILL_TO_CUSTOMER_ID="HCA"."CUST_ACCOUNT_ID" 从索引中获取 ROWID。然后通过这些 ROWID 从表 RA_CUSTOMER_TRX_ALL 中获取行并按日期过滤它们:
|*  7 |      TABLE ACCESS BY INDEX ROWID| RA_CUSTOMER_TRX_ALL           |     1 |    23 |   238   (1)| 00:00:01 |
|* 8 | INDEX RANGE SCAN | XXCCC_RA_CUSTOMER_TRX_ALL_N97 | 1353 | | 8 (0)| 00:00:01 |

Predicate Information (identified by operation id):
---------------------------------------------------

7 - filter(("RCT"."TRX_DATE">=TO_DATE(' 2020-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"RCT"."ORG_ID"=3523 AND "RCT"."TRX_DATE"<TO_DATE(' 2020-03-03 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
8 - access("RCT"."BILL_TO_CUSTOMER_ID"="HCA"."CUST_ACCOUNT_ID")

所以看起来你得到了大量满足 BILL_TO_CUSTOMER_ID 条件的行和非常少量的行 "TRX_DATE" between date'2020-03-01' and date'2020-03-03' .
虽然第二个计划显示它使用 INDEX_SS 按日期获取第一行。
因此,您有以下选择:
  • 您需要收集/修复统计数据,以便 CBO 可以获得正确的基数来制定计划; (最佳)
  • 您可以添加提示 dynamic_sampling所以oracle可以在解析过程中得到正确的基数; (第一个选项更好)
  • (最佳)您可以在 ra_customer_trx_all(BILL_TO_CUSTOMER_ID, TRX_DATE) 上添加索引或以相反的顺序ra_customer_trx_all(TRX_DATE, BILL_TO_CUSTOMER_ID) .您需要分析在您的情况下哪个是最好的(即考虑其他查询、操作等)。通常最好减少 ISS(索引跳过扫描)的数量,因为它很难操作。所以在索引的情况下 (BILL_TO_CUSTOMER_ID, TRX_DATE) CBO 可以使用此索引轻松获取少量所需的行。
  • 您可以使用提示或 sql 配置文件或 sql 基线强制 CBO 对第一个查询使用类似的计划,例如使用以下提示:

  • select --+ leading(hca rct rcl) use_hash(rct) index(rct) use_nl(rcl)
    hca.account_number,
    hca.attribute3,
    SUM(rcl.extended_amount)
    from
    ra_customer_trx_all rct
    JOIN ra_customer_trx_lines_all rcl
    ON rct.customer_trx_id = rcl.customer_trx_id
    JOIN HZ_CUST_ACCOUNTS_ALL hca
    ON rct.bill_to_customer_id = hca.cust_account_id
    where
    rct.trx_date >= TO_DATE('01-MAR-2020', 'DD-MON-YYYY')
    AND rct.trx_date < TO_DATE('03-MAR-2020', 'DD-MON-YYYY')
    AND rct.org_id = 3523

    AND rcl.line_type = 'LINE'

    AND hca.account_number = '1063431'
    group by
    hca.account_number,
    hca.attribute3;
    如果您不能使用第一个选项,则第四个选项只是一种解决方法。
    更新
    同样非常重要的事情(在您的情况下可能是最重要的):在大数据倾斜的情况下(这在每个帐户的交易数量的情况下很常见)您可以收集直方图, 但是 在连接谓词的情况下,它们无济于事,即直方图可能会在 select * from t where skewed_col=literal 的情况下帮助您或 select * from t where skewed_col=:bind (具有绑定(bind)变量窥视和/或自适应游标共享),但不适用于
    select * 
    from x, t
    where x.id = literal
    and t.skewed_col = x.col
    在这种情况下,CBO 不能使用直方图,因为它不知道确切的值并采用平均选择性。
    您可以轻松检查数据是如何倾斜的:
    select *
    from (
    select
    v.*,
    dense_rank()over(order by cnt asc) min_n,
    dense_rank()over(order by cnt desc) max_n
    from (
    select
    bill_to_customer_id,
    count(*) cnt
    from ra_customer_trx_all rct
    group by bill_to_customer_id
    ) v
    )
    where min_n <=10 or max_n >=10;
    您会看到最受欢迎和最不受欢迎的前 10 名 bill_to_customer_id .
    仍然选项 3(索引(bill_to_customer_id,trx_date)即使在数据有偏差的情况下也会为您提供帮助。
    还有另一种方式,但 oracle 不支持它,所以仅供引用:
    select 
    hca.account_number,
    hca.attribute3,
    SUM(rcl.extended_amount)
    from
    ra_customer_trx_all rct
    JOIN ra_customer_trx_lines_all rcl
    ON rct.customer_trx_id = rcl.customer_trx_id
    JOIN HZ_CUST_ACCOUNTS_ALL hca
    ON rct.bill_to_customer_id = hca.cust_account_id
    where
    rct.trx_date >= TO_DATE('01-MAR-2020', 'DD-MON-YYYY')
    AND
    rct.trx_date < TO_DATE('03-MAR-2020', 'DD-MON-YYYY')
    AND
    rcl.line_type = 'LINE'
    AND
    rct.org_id = 3523
    AND hca.account_number = '1063431'
    -- added: *** don't use in production!
    AND rct.bill_to_customer_id in
    (select/*+ precompute_subquery */
    h.cust_account_id
    from HZ_CUST_ACCOUNTS_ALL h
    where h.account_number='1063431')
    group by
    hca.account_number,
    hca.attribute3;
    如您所见,我添加了 bill_to_customer_id in (select...)带有未记录的提示 precompute_subquery ,这会强制 Oracle 用“预计算”值替换该子查询,即 oracle 执行该子查询并用它们替换子查询,因此 CBO 现在可以使用直方图来正确估计基数。但它不受 Oracle 支持,因此您可以使用它,但未经 Oracle 支持人员批准,请勿在生产中使用它。

    关于sql - 添加 where 子句会使查询速度变慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63851702/

    27 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com