gpt4 book ai didi

oracle - 设计sql、索引以提高count(*)查询性能

转载 作者:行者123 更新时间:2023-12-02 06:56:20 24 4
gpt4 key购买 nike

大家好:) 我正在构建一个工具来对我们的 Oracle 10g 数据库进行一些体积采样。这是查询:

SELECT count(*) 
FROM product
JOIN customer ON product.CUSTOMER_ID = customer.ID
WHERE
( product.CATEGORY = 'some first category criteria'
AND customer.REGION = 'some first region criteria'
AND ...)
OR
( product.CATEGORY = 'some second category criteria'
AND customer.REGION = 'some second region criteria'
AND ...)
OR ...

我从这个查询中所需要的只是进行计数。问题是数据量很大:每个表上大约有 3000 万行,我希望这个查询能够响应。

到目前为止,在 customer (<search criteria column>, CUSTOMER_ID) 上有复合索引有很大帮助。我觉得还是oracle的 helper 吧JOIN在索引过滤操作之后。

每个(... AND ... AND ...) block 预计包含大约 50 000 行。搜索条件中使用的列均具有大小约为 1000 个值的集合中的值。

我想知道我可以实现什么方法,因为我只会做 count(*) s,特别是因为 Oracle 有一个内置的 OLAP 模块(以及 CUBE 操作?)。另外,我确信通过深思熟虑的索引和提示可以大大改进事情。

你会如何设计这个?

最佳答案

这看起来是 bitmap indexes 的一个不错的候选者。 :

Bitmap indexes are primarily designed for data warehousing or environments in which queries reference many columns in an ad hoc fashion. Situations that may call for a bitmap index include:

The indexed columns have low cardinality, that is, the number of distinct values is small compared to the number of table rows.

The indexed table is either read-only or not subject to significant modification by DML statements.

具体来说,位图连接索引在这里可能是理想的选择。手册中的示例甚至与您的数据模型相匹配。我尝试在下面重新创建您的模型和数据,并且位图连接索引的运行速度似乎比其他解决方案快几个数量级。

示例数据

--Create tables
create table customer
(
customer_id number,
region varchar2(100) not null
) nologging;

create table product
(
product_id number,
customer_id number not null,
category varchar2(100) not null
) nologging;


--Load 30M rows, 1M rows at a time. Takes about 6 minutes.
begin
for i in 1 .. 30 loop
insert /*+ append */ into customer
select (1000000*i)+level, 'Region '||trunc(dbms_random.value(1, 1000))
from dual connect by level <= 1000000;
commit;

insert /*+ append */ into product
select (1000000*i)+level, (1000000*i)+level
,'Category '||trunc(dbms_random.value(1, 1000))
from dual connect by level <= 1000000;
commit;
end loop;
end;
/

--Add primary keys and foreign key constraints.
alter table customer add constraint customer_pk primary key (customer_id);
alter table product add constraint product_pk primary key (product_id);
alter table product add constraint product_customer_fk
foreign key (customer_id) references customer(customer_id);

--Gather stats
begin
dbms_stats.gather_table_stats(user, 'CUSTOMER');
dbms_stats.gather_table_stats(user, 'PRODUCT');
end;
/

未索引 - 速度慢

正如预期的那样,性能很差。此示例查询在我的计算机上大约需要 75 秒。

SELECT count(*) 
FROM product
JOIN customer ON product.CUSTOMER_ID = customer.customer_id
WHERE (product.CATEGORY = 'Category 1' AND customer.REGION = 'Region 1')
OR (product.CATEGORY = 'Category 2' AND customer.REGION = 'Region 2')
OR (product.CATEGORY = 'Category 888' AND customer.REGION = 'Region 888');

B 树索引 - 仍然很慢

计划发生变化,但性能保持不变。我认为这可能是因为我的示例是最坏情况的索引场景,其中数据确实是随机的。

create index customer_idx on customer(region);
create index product_idx on product(category);

begin
dbms_stats.gather_table_stats(user, 'CUSTOMER');
dbms_stats.gather_table_stats(user, 'PRODUCT');
end;
/

位图索引 - 好一点

这会稍微提高性能,达到大约 61 秒。

drop index customer_idx;
drop index product_idx;

create bitmap index customer_bidx on customer(region);
create bitmap index product_bidx on product(category);

begin
dbms_stats.gather_table_stats(user, 'CUSTOMER');
dbms_stats.gather_table_stats(user, 'PRODUCT');
end;
/

位图连接索引 - 速度快得令人难以置信

现在查询几乎立即返回结果,我的 IDE 将其计为 0 秒。

drop index customer_idx;
drop index product_idx;

create bitmap index customer_product_bjix
on product(product.category, customer.region)
FROM product, customer
where product.CUSTOMER_ID = customer.customer_id;

begin
dbms_stats.gather_table_stats(user, 'CUSTOMER');
dbms_stats.gather_table_stats(user, 'PRODUCT');
end;
/

指数成本

位图连接索引的创建时间比 B 树或位图索引稍长一些。与位图或位图连接索引相比,b 树索引非常大。

select segment_name, bytes/1024/1024 MB
from dba_segments
where segment_name in ('CUSTOMER_IDX', 'PRODUCT_IDX'
,'CUSTOMER_BIDX', 'PRODUCT_BIDX', 'CUSTOMER_PRODUCT_BJIX');


SEGMENT_NAME MB
------------ --
CUSTOMER_IDX 726
PRODUCT_IDX 792
CUSTOMER_BIDX 88
PRODUCT_BIDX 96
CUSTOMER_PRODUCT_BJIX 184

查询样式

这不会影响性能,但您可以像这样缩小查询:

SELECT count(*) 
FROM product
JOIN customer ON product.CUSTOMER_ID = customer.customer_id
WHERE (product.category, customer.region)
in (('Category 1', 'Region 1'),
('Category 2', 'Region 2'),
('Category 888', 'Region 888'));

关于oracle - 设计sql、索引以提高count(*)查询性能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16921897/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com