gpt4 book ai didi

sql - 帮助优化 Oracle 查询

转载 作者:行者123 更新时间:2023-12-02 05:55:40 26 4
gpt4 key购买 nike

我将通过说明我正在使用 Oracle 10g 企业版并且我对 Oracle 相对较新来作为这个问题的序言。

我有一个具有以下架构的表:

ID           integer (pk)  -- unique index
PERSON_ID integer (fk) -- b-tree index
NAME_PART nvarchar -- b-tree index
NAME_PART_ID integer (fk) -- bitmap index

PERSON_ID 是个人记录的唯一 ID 的外键。 NAME_PART_ID 是查找表的外键,具有静态值,如“名字”、“中间名”、“姓氏”等。该表的要点是存储人们的各个部分分别命名。每个人的记录至少有一个名字。在尝试提取数据时,我首先考虑使用连接,如下所示:

select
first_name.person_id,
first_name.name_part,
middle_name.name_part,
last_name.name_part
from
NAME_PARTS first_name
left join
NAME_PARTS middle_name
on first_name.person_id = middle_name.person_id
left join
NAME_PARTS last_name
on first_name.person_id = last_name.person_id
where
first_name.name_part_id = 1
and middle_name.name_part_id = 2
and last_name.name_part_id = 3;

但该表有数千万条记录,NAME_PART_ID 列的位图索引未被使用。解释计划表明优化器正在使用全表扫描和散列连接来检索数据。

有什么建议吗?

编辑:以这种方式设计表的原因是因为数据库在多种不同的文化中使用,每种文化对个人的命名方式都有不同的约定(例如,在某些中东文化中,个人通常有名字,然后是他们父亲的名字,然后是他父亲的名字,等等)。很难创建一个包含多个列的表格来说明所有文化差异。

最佳答案

鉴于您实际上是在进行全表扫描(因为您的查询是从该表中提取所有数据,不包括没有名称部分的少数行,如第一、中间或最后),您可能需要考虑编写查询,以便它仅以稍微不同的格式返回数据,例如:

  SELECT person_id
, name_part_id
, name_part
FROM NAME_PART
WHERE name_part_id IN (1, 2, 3)
ORDER BY person_id
, name_part_id;

当然,您最终会得到 3 行,而不是每个名称一行,但是对于您的客户端代码来说,将它们汇总在一起可能是微不足道的。您还可以使用 decode、group by 和 max 将 3 行合并为一行:

  SELECT person_id
, max(decode(name_part_id, 1, name_part, null)) first
, max(decode(name_part_id, 2, name_part, null)) middle
, max(decode(name_part_id, 3, name_part, null)) last
FROM NAME_PART
WHERE name_part_id IN (1, 2, 3)
GROUP BY person_id
ORDER BY person_id;

这将产生与您的原始查询相同的结果。两个版本都只会扫描表一次(排序),而不是处理 3 向连接。如果您将表设为 person_id 索引上的索引组织表,则可以省去排序步骤。

我对一张有 56,150 人的 table 进行了测试,以下是结果的概要:

原始查询:

Execution Plan
----------------------------------------------------------

------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 113K| 11M| | 1364 (2)|
|* 1 | HASH JOIN | | 113K| 11M| 2528K| 1364 (2)|
|* 2 | TABLE ACCESS FULL | NAME_PART | 56150 | 1864K| | 229 (3)|
|* 3 | HASH JOIN | | 79792 | 5298K| 2528K| 706 (2)|
|* 4 | TABLE ACCESS FULL| NAME_PART | 56150 | 1864K| | 229 (3)|
|* 5 | TABLE ACCESS FULL| NAME_PART | 56150 | 1864K| | 229 (3)|
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("FIRST_NAME"."PERSON_ID"="LAST_NAME"."PERSON_ID")
2 - filter("LAST_NAME"."NAME_PART_ID"=3)
3 - access("FIRST_NAME"."PERSON_ID"="MIDDLE_NAME"."PERSON_ID")
4 - filter("FIRST_NAME"."NAME_PART_ID"=1)
5 - filter("MIDDLE_NAME"."NAME_PART_ID"=2)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
6740 consistent gets
0 physical reads
0 redo size
5298174 bytes sent via SQL*Net to client
26435 bytes received via SQL*Net from client
3745 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
56150 rows processed

我的查询 #1(3 行/人):

Execution Plan
----------------------------------------------------------

-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 168K| 5593K| | 1776 (2)|
| 1 | SORT ORDER BY | | 168K| 5593K| 14M| 1776 (2)|
|* 2 | TABLE ACCESS FULL| NAME_PART | 168K| 5593K| | 230 (3)|
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("NAME_PART_ID"=1 OR "NAME_PART_ID"=2 OR "NAME_PART_ID"=3)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1005 consistent gets
0 physical reads
0 redo size
3799794 bytes sent via SQL*Net to client
78837 bytes received via SQL*Net from client
11231 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
168450 rows processed

我的查询 #2(1 行/人):

Execution Plan
----------------------------------------------------------

-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 56150 | 1864K| | 1115 (3)|
| 1 | SORT GROUP BY | | 56150 | 1864K| 9728K| 1115 (3)|
|* 2 | TABLE ACCESS FULL| NAME_PART | 168K| 5593K| | 230 (3)|
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("NAME_PART_ID"=1 OR "NAME_PART_ID"=2 OR "NAME_PART_ID"=3)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1005 consistent gets
0 physical reads
0 redo size
5298159 bytes sent via SQL*Net to client
26435 bytes received via SQL*Net from client
3745 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
56150 rows processed

事实证明,您还可以更快地挤压它;我试图通过添加索引提示来强制使用 person_id 索引来避免排序。我设法又减少了 10%,但它看起来仍然在排序:

  SELECT /*+ index(name_part,NAME_PART_person_id) */ person_id
, max(decode(name_part_id, 1, name_part)) first
, max(decode(name_part_id, 2, name_part)) middle
, max(decode(name_part_id, 3, name_part)) last
FROM name_part
WHERE name_part_id IN (1, 2, 3)
GROUP BY person_id
ORDER BY person_id;

Execution Plan
----------------------------------------------------------

-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 56150 | 1864K| | 3385 (1)|
| 1 | SORT GROUP BY | | 56150 | 1864K| 9728K| 3385 (1)|
| 2 | INLIST ITERATOR | | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID | NAME_PART | 168K| 5593K| | 2500 (1)|
| 4 | BITMAP CONVERSION TO ROWIDS| | | | | |
|* 5 | BITMAP INDEX SINGLE VALUE | NAME_PART_NAME_PART_ID| | | | |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("NAME_PART_ID"=1 OR "NAME_PART_ID"=2 OR "NAME_PART_ID"=3)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
971 consistent gets
0 physical reads
0 redo size
5298159 bytes sent via SQL*Net to client
26435 bytes received via SQL*Net from client
3745 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
56150 rows processed

但是,上述计划都是基于您从整个表格中进行选择的假设。如果您根据 person_id 限制结果(例如,person_id 在 55968 和 56000 之间),事实证明您使用散列连接的原始查询是最快的(对于我指定的约束,27 对 106 一致获取)。

在第三方面,如果上面的查询被用于填充使用光标滚动结果集的 GUI(这样一开始您只会看到结果集的前 N ​​行 - 通过添加在此处复制一个“and rowcount < 50”谓词),我的查询版本再次变得很快 - 非常快(4 个一致获取对 417 个)。

这个故事的寓意是,它实际上完全取决于您访问数据的方式。适用于整个结果集的查询在应用于不同的子集时可能会更差。

关于sql - 帮助优化 Oracle 查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1272990/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com