gpt4 book ai didi

Sql:优化 BETWEEN 子句

转载 作者:行者123 更新时间:2023-12-04 09:03:11 25 4
gpt4 key购买 nike

我写了一个需要将近一个小时才能运行的声明,所以我寻求帮助,以便我可以更快地完成这项工作。所以我们开始:

我正在对两个表进行内部连接:

我有许多由间隔表示的时间间隔,我只想从这些间隔内的度量中获取度量数据。
intervals : 有两列,一列是开始时间,另一列是区间的结束时间(行数=1295)
measures : 有两列,一列是度量,另一列是度量的时间(行数 = 一百万)

我想要得到的结果是一个表格,第一列是度量,然后是度量完成的时间,所考虑间隔的开始/结束时间(对于时间在所考虑范围内的行,它会重复)

这是我的代码:

select measures.measure as measure, measures.time as time, intervals.entry_time as entry_time, intervals.exit_time as exit_time
from
intervals
inner join
measures
on intervals.entry_time<=measures.time and measures.time <=intervals.exit_time
order by time asc

谢谢

最佳答案

这是一个很常见的问题。

普通 B-Tree索引不适合这样的查询:

SELECT  measures.measure as measure,
measures.time as time,
intervals.entry_time as entry_time,
intervals.exit_time as exit_time
FROM intervals
JOIN measures
ON measures.time BETWEEN intervals.entry_time AND intervals.exit_time
ORDER BY
time ASC

索引适用于搜索给定范围内的值,如下所示:



,但不是用于搜索包含给定值的边界,如下所示:



我博客中的这篇文章更详细地解释了这个问题:
  • Adjacency list vs. nested sets: MySQL

  • (嵌套集模型处理类似类型的谓词)。

    您可以在 time 上建立索引,这样就 intervals将在连接中领先,范围时间将在嵌套循环内使用。这将需要对 time 进行排序.

    您可以在 intervals 上创建空间索引(在 MySQL 中可用,使用 MyISAM 存储)将包括 startend在一个几何列中。这样, measures可以引入连接并且不需要排序。

    然而,空间索引更慢,因此只有当您的度量很少但间隔很多时,这才会有效。

    由于您的间隔很少但度量很多,只需确保您在 measures.time 上有一个索引即可。 :
    CREATE INDEX ix_measures_time ON measures (time)

    更新:

    这是要测试的示例脚本:
    BEGIN
    DBMS_RANDOM.seed(20091223);
    END;
    /

    CREATE TABLE intervals (
    entry_time NOT NULL,
    exit_time NOT NULL
    )
    AS
    SELECT TO_DATE('23.12.2009', 'dd.mm.yyyy') - level,
    TO_DATE('23.12.2009', 'dd.mm.yyyy') - level + DBMS_RANDOM.value
    FROM dual
    CONNECT BY
    level <= 1500
    /

    CREATE UNIQUE INDEX ux_intervals_entry ON intervals (entry_time)
    /

    CREATE TABLE measures (
    time NOT NULL,
    measure NOT NULL
    )
    AS
    SELECT TO_DATE('23.12.2009', 'dd.mm.yyyy') - level / 720,
    CAST(DBMS_RANDOM.value * 10000 AS NUMBER(18, 2))
    FROM dual
    CONNECT BY
    level <= 1080000
    /

    ALTER TABLE measures ADD CONSTRAINT pk_measures_time PRIMARY KEY (time)
    /

    CREATE INDEX ix_measures_time_measure ON measures (time, measure)
    /

    这个查询:
    SELECT  SUM(measure), AVG(time - TO_DATE('23.12.2009', 'dd.mm.yyyy'))
    FROM (
    SELECT *
    FROM (
    SELECT /*+ ORDERED USE_NL(intervals measures) */
    *
    FROM intervals
    JOIN measures
    ON measures.time BETWEEN intervals.entry_time AND intervals.exit_time
    ORDER BY
    time
    )
    WHERE rownum <= 500000
    )

    用途 NESTED LOOPS并返回 1.7秒。

    这个查询:
    SELECT  SUM(measure), AVG(time - TO_DATE('23.12.2009', 'dd.mm.yyyy'))
    FROM (
    SELECT *
    FROM (
    SELECT /*+ ORDERED USE_MERGE(intervals measures) */
    *
    FROM intervals
    JOIN measures
    ON measures.time BETWEEN intervals.entry_time AND intervals.exit_time
    ORDER BY
    time
    )
    WHERE rownum <= 500000
    )

    用途 MERGE JOIN我不得不在 5 之后停止它分钟。

    更新 2:

    您很可能需要使用如下提示强制引擎在连接中使用正确的表顺序:
    SELECT  /*+ LEADING (intervals) USE_NL(intervals, measures) */
    measures.measure as measure,
    measures.time as time,
    intervals.entry_time as entry_time,
    intervals.exit_time as exit_time
    FROM intervals
    JOIN measures
    ON measures.time BETWEEN intervals.entry_time AND intervals.exit_time
    ORDER BY
    time ASC
    Oracle的优化器不够聪明,无法看到间隔不相交。这就是为什么它很可能会使用 measures作为领先表(如果间隔相交,这将是一个明智的决定)。

    更新 3:
    WITH    splits AS
    (
    SELECT /*+ MATERIALIZE */
    entry_range, exit_range,
    exit_range - entry_range + 1 AS range_span,
    entry_time, exit_time
    FROM (
    SELECT TRUNC((entry_time - TO_DATE(1, 'J')) * 2) AS entry_range,
    TRUNC((exit_time - TO_DATE(1, 'J')) * 2) AS exit_range,
    entry_time,
    exit_time
    FROM intervals
    )
    ),
    upper AS
    (
    SELECT /*+ MATERIALIZE */
    MAX(range_span) AS max_range
    FROM splits
    ),
    ranges AS
    (
    SELECT /*+ MATERIALIZE */
    level AS chunk
    FROM upper
    CONNECT BY
    level <= max_range
    ),
    tiles AS
    (
    SELECT /*+ MATERIALIZE USE_MERGE (r s) */
    entry_range + chunk - 1 AS tile,
    entry_time,
    exit_time
    FROM ranges r
    JOIN splits s
    ON chunk <= range_span
    )
    SELECT /*+ LEADING(t) USE_HASH(m t) */
    SUM(LENGTH(stuffing))
    FROM tiles t
    JOIN measures m
    ON TRUNC((m.time - TO_DATE(1, 'J')) * 2) = tile
    AND m.time BETWEEN t.entry_time AND t.exit_time

    此查询将时间轴拆分为多个范围并使用 HASH JOIN加入范围值的度量和时间戳,稍后进行精细过滤。

    有关其工作原理的更详细说明,请参阅我博客中的这篇文章:
  • Oracle: joining timestamps and time intervals
  • 关于Sql:优化 BETWEEN 子句,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1947693/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com