gpt4 book ai didi

sql - 使用通用的相关子查询有效地提取不同的列

转载 作者:行者123 更新时间:2023-12-01 07:21:32 25 4
gpt4 key购买 nike

我需要从子查询中提取多个列,这也需要一个 WHERE 过滤器引用 FROM 表的列。我对此有几个问题:

  • 除了我的下面这个问题还有其他解决方案吗?
  • 是否需要另一种解决方案,或者这种解决方案是否足够有效?

  • 例子:

    在下面的示例中,我正在编写一个 View 来显示考试成绩,特别是发现可能需要解决或重考的失败。

    我不能简单地使用 JOIN,因为我需要先过滤我的实际子查询(注意我正在为“考生”获得 TOP 1,按分数或日期降序排序)

    我的目标是避免重复编写(和执行)本质上相同的子查询。
    SELECT ExamineeID, LastName, FirstName, Email,
    (SELECT COUNT(examineeTestID)
    FROM exam.ExamineeTest tests
    WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2) Attempts,
    (SELECT TOP 1 ExamineeTestID
    FROM exam.ExamineeTest T
    WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
    ORDER BY Score DESC) bestExamineeTestID,
    (SELECT TOP 1 Score
    FROM exam.ExamineeTest T
    WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
    ORDER BY Score DESC) bestScore,
    (SELECT TOP 1 DateDue
    FROM exam.ExamineeTest T
    WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
    ORDER BY Score DESC) bestDateDue,
    (SELECT TOP 1 TimeCommitted
    FROM exam.ExamineeTest T
    WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
    ORDER BY Score DESC) bestTimeCommitted,
    (SELECT TOP 1 ExamineeTestID
    FROM exam.ExamineeTest T
    WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
    ORDER BY DateDue DESC) currentExamineeTestID,
    (SELECT TOP 1 Score
    FROM exam.ExamineeTest T
    WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
    ORDER BY DateDue DESC) currentScore,
    (SELECT TOP 1 DateDue
    FROM exam.ExamineeTest T
    WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
    ORDER BY DateDue DESC) currentDateDue,
    (SELECT TOP 1 TimeCommitted
    FROM exam.ExamineeTest T
    WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
    ORDER BY DateDue DESC) currentTimeCommitted
    FROM exam.Examinee E

    最佳答案

    先回答你的第二个问题,是的,有更好的方法是为了,因为你使用的查询很难理解,很难维护,即使现在性能可以接受,查询同一个表多个也是一种耻辱如果您的应用程序增长到可观的大小,有时您不需要增加性能可能并不总是可以接受的。

    为了回答你的第一个问题,我为你提供了一些方法。除非另有说明,否则这些假设是 SQL 2005 或更高版本。

    请注意,您不需要 BestExamineeID 和 CurrentExamineeID,因为它们将始终与 ExamineeID 相同,除非未进行任何测试并且它们为 NULL,您可以从其他列为 NULL 的列中看出这一点。

    您可以将 OUTER/CROSS APPLY 视为一个运算符,它允许您将相关子查询从 WHERE 子句移动到 JOIN 子句中。它们可以具有对先前命名的表的外部引用,并且可以返回多个列。这使您可以为每个逻辑查询仅执行一次工作,而不是为每一列执行一次。

    SELECT
    ExamineeID,
    LastName,
    FirstName,
    Email,
    B.Attempts,
    BestScore = B.Score,
    BestDateDue = B.DateDue,
    BestTimeCommitted = B.TimeCommitted,
    CurrentScore = C.Score,
    CurrentDateDue = C.DateDue,
    CurrentTimeCommitted = C.TimeCommitted
    FROM
    exam.Examinee E
    OUTER APPLY ( -- change to CROSS APPLY if you only want examinees who've tested
    SELECT TOP 1
    Score, DateDue, TimeCommitted,
    Attempts = Count(*) OVER ()
    FROM exam.ExamineeTest T
    WHERE
    E.ExamineeID = T.ExamineeID
    AND T.TestRevisionID = 3
    AND T.TestID = 2
    ORDER BY Score DESC
    ) B
    OUTER APPLY ( -- change to CROSS APPLY if you only want examinees who've tested
    SELECT TOP 1
    Score, DateDue, TimeCommitted
    FROM exam.ExamineeTest T
    WHERE
    E.ExamineeID = T.ExamineeID
    AND T.TestRevisionID = 3
    AND T.TestID = 2
    ORDER BY DateDue DESC
    ) C

    你应该试验一下我的 Count(*) OVER ()比拥有额外的 OUTER APPLY 更好这只是得到计数。如果您不限制考生来自 exam.Examinee表,最好在派生表中进行普通聚合。

    这是另一种方法,它(某种程度上)可以一次性获取所有数据。可以想象,它可以比其他查询执行得更好,但我的经验是,在某些情况下,窗口函数可能会变得非常昂贵且令人惊讶,因此需要进行测试。
    WITH Data AS (
    SELECT
    *,
    Count(*) OVER (PARTITION BY ExamineeID) Cnt,
    Row_Number() OVER (PARTITION BY ExamineeID ORDER BY Score DESC) ScoreOrder,
    Row_Number() OVER (PARTITION BY ExamineeID ORDER BY DateDue DESC) DueOrder
    FROM
    exam.ExamineeTest
    ), Vals AS (
    SELECT
    ExamineeID,
    Max(Cnt) Attempts,
    Max(CASE WHEN ScoreOrder = 1 THEN Score ELSE NULL END) BestScore,
    Max(CASE WHEN ScoreOrder = 1 THEN DateDue ELSE NULL END) BestDateDue,
    Max(CASE WHEN ScoreOrder = 1 THEN TimeCommitted ELSE NULL END) BestTimeCommitted,
    Max(CASE WHEN DueOrder = 1 THEN Score ELSE NULL END) BestScore,
    Max(CASE WHEN DueOrder = 1 THEN DateDue ELSE NULL END) BestDateDue,
    Max(CASE WHEN DueOrder = 1 THEN TimeCommitted ELSE NULL END) BestTimeCommitted
    FROM Data
    GROUP BY
    ExamineeID
    )
    SELECT
    E.ExamineeID,
    E.LastName,
    E.FirstName,
    E.Email,
    V.Attempts,
    V.BestScore, V.BestDateDue, V.BestTimeCommitted,
    V.CurrentScore, V.CurrentDateDue, V.CurrentTimeCommitted
    FROM
    exam.Examinee E
    LEFT JOIN Vals V ON E.ExamineeID = V.ExamineeID
    -- change join to INNER if you only want examinees who've tested

    最后,这是一个 SQL 2000 方法:
    SELECT
    E.ExamineeID,
    E.LastName,
    E.FirstName,
    E.Email,
    Y.Attempts,
    Y.BestScore, Y.BestDateDue, Y.BestTimeCommitted,
    Y.CurrentScore, Y.CurrentDateDue, Y.CurrentTimeCommitted
    FROM
    exam.Examinee E
    LEFT JOIN ( -- change to inner if you only want examinees who've tested
    SELECT
    X.ExamineeID,
    X.Cnt Attempts,
    Max(CASE Y.Which WHEN 1 THEN T.Score ELSE NULL END) BestScore,
    Max(CASE Y.Which WHEN 1 THEN T.DateDue ELSE NULL END) BestDateDue,
    Max(CASE Y.Which WHEN 1 THEN T.TimeCommitted ELSE NULL END) BestTimeCommitted,
    Max(CASE Y.Which WHEN 2 THEN T.Score ELSE NULL END) CurrentScore,
    Max(CASE Y.Which WHEN 2 THEN T.DateDue ELSE NULL END) CurrentDateDue,
    Max(CASE Y.Which WHEN 2 THEN T.TimeCommitted ELSE NULL END) CurrentTimeCommitted
    FROM
    (
    SELECT ExamineeID, Max(Score) MaxScore, Max(DueDate) MaxDueDate, Count(*) Cnt
    FROM exam.ExamineeTest
    WHERE
    TestRevisionID = 3
    AND TestID = 2
    GROUP BY ExamineeID
    ) X
    CROSS JOIN (SELECT 1 UNION ALL SELECT 2) Y (Which)
    INNER JOIN exam.ExamineeTest T
    ON X.ExamineeID = T.ExamineeID
    AND (
    (Y.Which = 1 AND X.MaxScore = T.MaxScore)
    OR (Y.Which = 2 AND X.MaxDueDate = T.MaxDueDate)
    )
    WHERE
    T.TestRevisionID = 3
    AND T.TestID = 2
    GROUP BY
    X.ExamineeID,
    X.Cnt
    ) Y ON E.ExamineeID = Y.ExamineeID

    如果 (ExamineeID, Score) 或 (ExamineeID, DueDate) 的组合可以返回多行,则此查询将返回意外的额外行。对于 Score,这可能不太可能。如果两者都不是唯一的,那么您需要使用(或添加)一些可以授予唯一性的附加列,以便它可以用来选择一行。如果只有 Score 可以复制,那么首先获得最大 Score 的附加预查询,然后与最大 DueDate 相吻合,将结合拉取与最高分并列的最新分数,同时获得最新分数数据。如果您需要更多 SQL 2000 帮助,请告诉我。

    注意:要控制 CROSS APPLY 还是 ROW_NUMBER() 解决方案更好的最重要的事情是您是否在正在查找的列上有索引,以及数据是密集还是稀疏。
  • 索引 + 你只拉了几个考生,每个考生都做了很多测试 = CROSS APPLY 获胜。
  • 索引 + 你进行了大量的检查,每次只有几个测试 = ROW_NUMBER() 获胜。
  • 无索引 = 字符串连接/值打包方法获胜(此处未显示)。

  • 我为 SQL 2000 提供的 group by 解决方案的性能可能最差,但不能保证。就像我说的,测试是有序的。

    如果我的任何查询确实导致性能问题,请告诉我,我会看看我能做些什么来提供帮助。我敢肯定我可能有错别字,因为我没有使用任何 DDL 来重新创建您的表,但我尽了最大努力没有尝试。

    如果性能确实变得至关重要,我将创建 ExamineeTestBest 和 ExamineeTestCurrent 表,这些表由 ExamineeTest 表上的触发器推送到,这些表将始终保持更新。然而,这是非规范化并且可能不是必需的或一个好主意,除非您已经扩展得如此之大以至于检索结果变得 Not Acceptable 地长。

    关于sql - 使用通用的相关子查询有效地提取不同的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6157830/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com