gpt4 book ai didi

sql - 通过 Oracle SQL 检索范围内的数字

转载 作者:行者123 更新时间:2023-12-05 05:34:17 24 4
gpt4 key购买 nike

由于扫描了数百万条记录,我的查询非常慢。该查询搜索特定范围内的数字数量。

我有 2 个表:numbers_in_rangesperson

Create table numbers_in_ranges
( range_id number(9,0) ,
begin_range number(9,0),
end_range number(9,0)
) ;

Create table person
(
id integer,
a_number varchar(9),
first_name varchar(25),
last_name varchar(25)
);

numbers_in_ranges 的数据

range_id| begin_range | end_range
--------|------------------------
101 | 100000000 | 200000000
102 | 210000000 | 290000000
103 | 350000000 | 459999999
104 | 461000000 | 569999999
106 | 241000000 | 241999999
e.t.c.

的数据


id | a_number | first_name | last_name
---|------------|------------|-----------
1 | 100000001 | Maria | Doe
2 | 100000999 | Emily | Davis
3 | 150000000 | Dave | Smith
4 | 461000000 | Jane | Jones
6 | 241000001 | John | Doe
7 | 100000002 | Maria | Doe
8 | 100009999 | Emily | Davis
9 | 150000010 | Dave | Smith
10 | 210000001 | Jane | Jones
11 | 210000010 | John | Doe
12 | 281000000 | Jane | Jones
13 | 241000000 | John | Doe
14 | 460000001 | Maria | Doe
15 | 500000999 | Emily | Davis
16 | 550000010 | Dave | Smith
17 | 461000010 | Jane | Jones
18 | 241000020 | John | Doe
e.t.c.

我们通过数据库链接从远程数据库获取范围数据并将其存储在物化 View 中。

查询

select nums.range_id, count(p. a_number) as a_count
from number_in_ranges nums
left join person p on to_number(p. a_number)
between nums.begin_range and nums.end_range
group by nums.range_id;

结果是这样的

range_id| a_count 
--------|------------------------
101 | 6
102 | 5
103 | 2
104 | 3
e.t.c

正如我所说,这个查询非常慢。

这是解释计划

Plan hash value: 3785994407

---------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9352 | 264K| | 42601 (31)| 00:00:02 | | | |
| 1 | PX COORDINATOR | | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10002 | 9352 | 264K| | 42601 (31)| 00:00:02 | Q1,02 | P->S | QC (RAND) |
| 3 | HASH GROUP BY | | 9352 | 264K| | 42601 (31)| 00:00:02 | Q1,02 | PCWP | |
| 4 | PX RECEIVE | | 9352 | 264K| | 42601 (31)| 00:00:02 | Q1,02 | PCWP | |
| 5 | PX SEND HASH | :TQ10001 | 9352 | 264K| | 42601 (31)| 00:00:02 | Q1,01 | P->P | HASH |
| 6 | HASH GROUP BY | | 9352 | 264K| | 42601 (31)| 00:00:02 | Q1,01 | PCWP | |
| 7 | MERGE JOIN OUTER | | 2084M| 56G| | 37793 (23)| 00:00:02 | Q1,01 | PCWP | |
| 8 | SORT JOIN | | 9352 | 173K| | 3 (34)| 00:00:01 | Q1,01 | PCWP | |
| 9 | PX BLOCK ITERATOR | | 9352 | 173K| | 2 (0)| 00:00:01 | Q1,01 | PCWC | |
| 10 | MAT_VIEW ACCESS FULL | NUMBERS_IN_RANGES | 9352 | 173K| | 2 (0)| 00:00:01 | Q1,01 | PCWP | |
|* 11 | FILTER | | | | | | | Q1,01 | PCWP | |
|* 12 | SORT JOIN | | 89M| 850M| 2732M| 29681 (1)| 00:00:02 | Q1,01 | PCWP | |
| 13 | BUFFER SORT | | | | | | | Q1,01 | PCWC | |
| 14 | PX RECEIVE | | 89M| 850M| | 4944 (1)| 00:00:01 | Q1,01 | PCWP | |
| 15 | PX SEND BROADCAST | :TQ10000 | 89M| 850M| | 4944 (1)| 00:00:01 | Q1,00 | P->P | BROADCAST |
| 16 | PX BLOCK ITERATOR | | 89M| 850M| | 4944 (1)| 00:00:01 | Q1,00 | PCWC | |
| 17 | INDEX FAST FULL SCAN| PERSON_AN_IDX | 89M| 850M| | 4944 (1)| 00:00:01 | Q1,00 | PCWP | |
---------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

11 - filter("NUMS"."END_RANGE">=TO_NUMBER("P"."A_NUMBER"(+)))
12 - access("NUMS"."BEGIN_RANGE"<=TO_NUMBER("P"."A_NUMBER"(+)))
filter("NUMS"."BEGIN_RANGE"<=TO_NUMBER("P"."A_NUMBER"(+)))

Note
-----
- automatic DOP: Computed Degree of Parallelism is 16 because of degree limit

我尝试运行该月的 deltas,然后将它们附加到表中,例如:如果找到新的 range_id 然后 插入如果找到 range_id then 更新所以我们不必扫描整个表。

但是这个解决方案并没有奏效,因为一些范围被更新了,并且发生了拼接,例如:

我们创建一个新的 range_id = 110,范围在 100110000210000001 之间然后将range_id = 101拼接为100000000100110000range_id = 102拼接为100110001210000000;

现在我想创建一个触发器,用于在创建或更新新范围时更新该表;然而,这是不可能的,因为我们是从将数据存储到物化 View 中的远程数据库获取此数据,并且我们无法在只读物化 View 上放置触发器。

我的问题是还有其他方法可以执行此操作或优化此查询吗?

谢谢!

最佳答案

问题是 Oracle 尝试广播包含所有 ID 的表,对于这种情况看起来很奇怪。

但是,由于您只需要计算行数并且(看起来)间隔不重叠,您可以提高性能并避免使用技巧join 两个数据集:将数据转换为事件流,其中每个开始和结束值标识系列的开始和结束,然后计算该系列中的事件数。这样你就可以使用 match_recognize这比 join 快得多。

查询将是:

with ranges_unpivot as (
/*Transform from_ ... to_... to the event-like structure*/
select
id
, val
, val_type
from ranges_table
unpivot(
val for val_type in (from_num as '01_START', to_num as '03_END')
)

union all

/*Append the rest of the data to the event stream*/
select
null,
id,
/*
This should be ordered between START mark and END mark
to process edge cases correctly
*/
'02_val'
from other_table
where id <= (select max(to_num) from ranges_table)
)

select /*+parallel(4) gather_plan_statistics*/ *
from ranges_unpivot
match_recognize (
order by val asc, val_type asc
measures
start_.id as range_id,
count(values_.val) as count_
pattern (start_ values_* end_)
define
start_ as val_type = '01_START',
values_ as val_type = '02_val',
end_ as val_type = '03_END'
)

此时在查询计划中显示:

| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.33

join查询相比:

select /*+gather_plan_statistics*/
rt.id as range_id,
count(ot.id) as count_
from ranges_table rt
left join other_table ot
on rt.from_num <= ot.id
and rt.to_num >= ot.id
group by rt.id

显示:

| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:13.84 |

参见 db<>fiddle .

关于sql - 通过 Oracle SQL 检索范围内的数字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73656086/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com