gpt4 book ai didi

oracle - 类似结果的 Oracle PARTITION BY 与 GROUP BY 的解释

转载 作者:行者123 更新时间:2023-12-04 00:48:51 25 4
gpt4 key购买 nike

我有下表(标记):

firstname    lastname    Mark    
------------------------------
arun prasanth 40
ann antony 45
sruthy abc 41
new abc 47
arun prasanth 45
arun prasanth 49
ann antony 49

并且想添加一个列,该列标记具有特定列的记录是否出现多次。这是结果:
firstname    lastname    Mark    MULTI_FLAG
----------------------------------------------
arun prasanth 40 1
ann antony 45 1
sruthy abc 41 0
new abc 47 0
arun prasanth 45 1
arun prasanth 49 1
ann antony 49 1

我可以通过以下 GROUP BY 查询获得结果:
SELECT M1.firstname
,M1.lastname
,M1.Mark
,M2.MULTI_COUNT
FROM Marks M1
JOIN (SELECT firstname, lastname, CASE WHEN COUNT (*) > 1 THEN 1 ELSE 0 END AS MULTI_COUNT
FROM Marks
GROUP BY firstname, lastname) M2
ON M2.firstname = M1.firstname AND M2.lastname = M1.lastname;

或者通过这个更漂亮的 PARTITION BY 查询:
SELECT
firstname,
lastname,
CASE WHEN COUNT(*) OVER (PARTITION BY
firstname,
lastname) > 1 THEN 1 ELSE 0 END AS MULTI_FLAG
FROM
Marks

在返回的类似大表上运行 GROUP BY 查询:
34 米 56 秒 595 毫秒

在返回的类似大表上运行 PARTITION BY 查询:
  • 第一次运行: 55 m 47 s 851 ms
  • 第二次运行: 36 m 46 s 95 ms

  • 我有兴趣知道:
  • 实现我的结果的最佳方式
  • 性能差异的原因。
  • 编辑:如何阅读查询计划。

  • 编辑:
    甲骨文版本
    Oracle 数据库 11g 企业版 11.2.0.3.0 版 - 64 位生产
    PL/SQL 版本 11.2.0.3.0 - 生产
    《核心 11.2.0.3.0 生产》
    适用于 Linux 的 TNS:版本 11.2.0.3.0 - 生产
    NLSRTL 版本 11.2.0.3.0 - 生产

    按计划分区
    PLAN_TABLE_OUTPUT
    Plan hash value: 3822227444

    ---------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
    ---------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 668K| 90M| | 90429 (1)| 00:18:06 |
    | 1 | WINDOW SORT | | 668K| 90M| 98M| 90429 (1)| 00:18:06 |
    |* 2 | HASH JOIN RIGHT OUTER | | 668K| 90M| | 69340 (1)| 00:13:53 |
    | 3 | TABLE ACCESS FULL | COUNTRY_REGION_MAPPINGS | 177 | 4779 | | 3 (0)| 00:00:01 |
    | 4 | NESTED LOOPS | | | | | | |
    | 5 | NESTED LOOPS | | 377K| 41M| | 69335 (1)| 00:13:53 |
    | 6 | MAT_VIEW ACCESS FULL | PROJINFO_MAX_ITER_MVW | 17713 | 328K| | 782 (1)| 00:00:10 |
    |* 7 | INDEX RANGE SCAN | Q_CLIN_ASSUM_BYCOUN_PK | 1 | | | 3 (0)| 00:00:01 |
    | 8 | TABLE ACCESS BY INDEX ROWID| Q_CLINICAL_ASSUM_BYCOUNTRY | 21 | 2016 | | 4 (0)| 00:00:01 |
    ---------------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    2 - access(UPPER("CRM"."COUNTRY"(+))=UPPER("QCAB"."TRIAL_COUNTRY"))
    7 - access("PMIM"."OPPORTUNITYNUM"="QCAB"."OPPORTUNITYNUM" AND "PMIM"."CONTRACTNUM"="QCAB"."CONTRACTNUM"
    AND "PMIM"."ITERATION"="QCAB"."ITERATION")
    filter(UPPER("QCAB"."SHEET_LOC") LIKE '%COUNTRY ASSUMPTIONS%' OR UPPER("QCAB"."SHEET_LOC") LIKE
    'INPUT%')

    按计划分组
    PLAN_TABLE_OUTPUT
    Plan hash value: 648231064

    ------------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
    ------------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 912 | 2052K| | 226K (1)| 00:45:22 |
    |* 1 | HASH JOIN | | 912 | 2052K| | 226K (1)| 00:45:22 |
    | 2 | TABLE ACCESS FULL | COUNTRY_REGION_MAPPINGS | 177 | 4779 | | 3 (0)| 00:00:01 |
    |* 3 | HASH JOIN | | 89667 | 194M| 45M| 226K (1)| 00:45:22 |
    | 4 | NESTED LOOPS | | | | | | |
    | 5 | NESTED LOOPS | | 377K| 41M| | 69335 (1)| 00:13:53 |
    | 6 | MAT_VIEW ACCESS FULL | PROJINFO_MAX_ITER_MVW | 17713 | 328K| | 782 (1)| 00:00:10 |
    |* 7 | INDEX RANGE SCAN | Q_CLIN_ASSUM_BYCOUN_PK | 1 | | | 3 (0)| 00:00:01 |
    | 8 | TABLE ACCESS BY INDEX ROWID | Q_CLINICAL_ASSUM_BYCOUNTRY | 21 | 2016 | | 4 (0)| 00:00:01 |
    | 9 | VIEW | | 668K| 1377M| | 86518 (1)| 00:17:19 |
    | 10 | HASH GROUP BY | | 668K| 72M| 80M| 86518 (1)| 00:17:19 |
    |* 11 | HASH JOIN RIGHT OUTER | | 668K| 72M| | 69340 (1)| 00:13:53 |
    | 12 | TABLE ACCESS FULL | COUNTRY_REGION_MAPPINGS | 177 | 2478 | | 3 (0)| 00:00:01 |
    | 13 | NESTED LOOPS | | | | | | |
    | 14 | NESTED LOOPS | | 377K| 35M| | 69335 (1)| 00:13:53 |
    | 15 | MAT_VIEW ACCESS FULL | PROJINFO_MAX_ITER_MVW | 17713 | 328K| | 782 (1)| 00:00:10 |
    |* 16 | INDEX RANGE SCAN | Q_CLIN_ASSUM_BYCOUN_PK | 1 | | | 3 (0)| 00:00:01 |
    | 17 | TABLE ACCESS BY INDEX ROWID| Q_CLINICAL_ASSUM_BYCOUNTRY | 21 | 1701 | | 4 (0)| 00:00:01 |
    ------------------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    1 - access("R2"."TRIAL_COUNTRY_CD"="CRM"."COUNTRY_CD" AND
    UPPER("CRM"."COUNTRY")=UPPER("QCAB"."TRIAL_COUNTRY"))
    3 - access("R2"."OPPORTUNITYNUM"="QCAB"."OPPORTUNITYNUM" AND "R2"."ITERATION"="QCAB"."ITERATION" AND
    "R2"."CONTRACTNUM"="QCAB"."CONTRACTNUM" AND "R2"."ASSUMPTION"="QCAB"."ASSUMPTION")
    7 - access("PMIM"."OPPORTUNITYNUM"="QCAB"."OPPORTUNITYNUM" AND "PMIM"."CONTRACTNUM"="QCAB"."CONTRACTNUM" AND
    "PMIM"."ITERATION"="QCAB"."ITERATION")
    filter(UPPER("QCAB"."SHEET_LOC") LIKE '%COUNTRY ASSUMPTIONS%' OR UPPER("QCAB"."SHEET_LOC") LIKE 'INPUT%')
    11 - access(UPPER("CRM"."COUNTRY"(+))=UPPER("QCAB"."TRIAL_COUNTRY"))
    16 - access("PMIM"."OPPORTUNITYNUM"="QCAB"."OPPORTUNITYNUM" AND "PMIM"."CONTRACTNUM"="QCAB"."CONTRACTNUM" AND
    "PMIM"."ITERATION"="QCAB"."ITERATION")
    filter(UPPER("QCAB"."SHEET_LOC") LIKE '%COUNTRY ASSUMPTIONS%' OR UPPER("QCAB"."SHEET_LOC") LIKE 'INPUT%')

    最佳答案

    通常,您从解析函数 count(*) 开始这导致了一个紧凑的 SQL。

    这种方法的缺点是必须对数据进行排序(请参阅 WINDOW SORT 操作)。 GROUP BY方法避免
    排序为 HASH GROUP BY可以使用,这可以带来更好的性能。

    您的示例涉及更多内容,因为您不使用表,而是使用连接三个表的 View - 对于 GROUP BY,此连接执行两次以及详细数据;哪一个
    当然不是最优的。

    所以我将从查询的分析函数版本开始(可能带有 PARALLEL 选项)。

    如果您想尝试 GROUP BY轻轨版本是可能的:

    1) 只对重复的键进行分组

    2) 制作 OUTER JOIN分配 MULTI_FLAG
    下面带有执行计划的示例 - 对您的数据进行简单测试

    with dups as (
    select firstname,lastname from tmp
    group by firstname,lastname
    having count(*) > 1)
    select tmp.FIRSTNAME, tmp.LASTNAME, tmp.MARK,
    case when dups.firstname is not NULL then 1 else 0 end as MULTI_FLAG
    from tmp
    left outer join dups on tmp.firstname = dups.firstname and tmp.lastname = dups.lastname;

    您仍然需要访问您的 View 两次,但最终加入会更快(特别是如果您只有少量重复键)。
    --------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
    --------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 105K| 26M| | 1673 (1)| 00:00:21 |
    |* 1 | HASH JOIN RIGHT OUTER| | 105K| 26M| 11M| 1673 (1)| 00:00:21 |
    | 2 | VIEW | | 105K| 10M| | 128 (4)| 00:00:02 |
    |* 3 | FILTER | | | | | | |
    | 4 | HASH GROUP BY | | 105K| 10M| | 128 (4)| 00:00:02 |
    | 5 | TABLE ACCESS FULL| TMP | 105K| 10M| | 125 (1)| 00:00:02 |
    | 6 | TABLE ACCESS FULL | TMP | 105K| 15M| | 125 (1)| 00:00:02 |
    --------------------------------------------------------------------------------------

    关于oracle - 类似结果的 Oracle PARTITION BY 与 GROUP BY 的解释,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55357476/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com