sql - 我如何优化这个组明智的最大 sql 查询？-6ren

sql - 我如何优化这个组明智的最大 sql 查询？

转载作者：行者123 更新时间：2023-11-29 13:28:11

这就是我想要的。它基本上从 daily_statistics 表中获取所有记录并按 user_id 分组。同时，它还做了以下事情:

用户的值按最近分组
attachment_ids 表示为一个数组，因此我可以确定用户有多少个附件

结果:

 user_id | country_id |       time_at       | assumed_gender |    attachment_ids
---------+------------+---------------------+----------------+----------------------
   21581 |        172 | 2015-04-18 17:55:00 |                | [5942]
   21610 |        140 | 2015-04-18 19:55:00 | male           | [5940]
   22044 |        174 | 2015-04-18 21:55:00 | female         | [12312313, 12312313]

   21353 |        174 | 2015-04-18 20:59:00 | male           | [5938]
   21573 |        246 | 2015-04-18 21:57:00 | male           | [5936]
(5 rows)

以下查询执行缓慢。大约 17 秒。

  SELECT
    ds.user_id,
    max(case when id=maxid then country_id end) AS country_id,
    max(case when id=maxid then time_at end) AS time_at,
    max(case when id=maxid then properties->'assumed_gender' end) AS assumed_gender,
    json_agg(to_json(attachment_id)) AS attachment_ids
  FROM daily_statistics ds JOIN (
      SELECT u.id as user_id, (
        SELECT ds2.id FROM daily_statistics ds2 WHERE ds2.user_id=u.id AND ds2.metric = 'participation' AND ds2.status = 'active' AND ds2.campaign_id = 39
        ORDER BY ds2.id DESC LIMIT 1
      ) AS maxid FROM users u
      WHERE u.properties -> 'provider' IN ('twitter')
  ) mu ON (ds.user_id=mu.user_id)
  WHERE ds.campaign_id = 39 AND ds.metric = 'participation' AND ds.status = 'active'
  GROUP BY ds.user_id;

问题在于 group wise max 语句。有没有办法优化这个查询并获得相同的输出？我在考虑使用某种横向连接？但是这样我就无法获得每个用户的 attachment_id 的数量。

编辑:对于 9k++ 的记录，表大小为 2m 行:执行此查询大约需要 25 秒。

foobar_production=> EXPLAIN ANALYZE SELECT
foobar_production->     ds.user_id,
foobar_production->     max(case when id=maxid then country_id end) AS country_id,
foobar_production->     max(case when id=maxid then time_at end) AS time_at,
foobar_production->     max(case when id=maxid then properties->'assumed_gender' end) AS assumed_gender,
foobar_production->     json_agg(to_json(attachment_id)) AS attachment_ids
foobar_production->   FROM daily_statistics ds JOIN (
foobar_production(>       SELECT u.id as user_id, (
foobar_production(>         SELECT ds2.id FROM daily_statistics ds2 WHERE ds2.user_id=u.id AND ds2.metric = 'participation' AND ds2.status = 'active' AND ds2.campaign_id = 4742
foobar_production(>         ORDER BY ds2.id DESC LIMIT 1
foobar_production(>       ) AS maxid FROM users u
foobar_production(>       WHERE u.properties -> 'provider' IN ('twitter')
foobar_production(>   ) mu ON (ds.user_id=mu.user_id)
foobar_production->   WHERE ds.campaign_id = 4742 AND ds.metric = 'participation' AND ds.status = 'active'
foobar_production->   GROUP BY ds.user_id;
                                                                                        QUERY PLAN                                             
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=2063.07..2063.08 rows=1 width=103) (actual time=25155.963..25156.859 rows=775 loops=1)
   ->  Nested Loop  (cost=0.98..1883.99 rows=2 width=103) (actual time=0.744..382.699 rows=2787 loops=1)
         ->  Index Scan using index_daily_statistics_on_campaign_id_and_type on daily_statistics ds  (cost=0.56..1621.73 rows=31 width=99) (actual time=0.107..33.513 rows=9751 loops=1)
               Index Cond: (campaign_id = 4742)
               Filter: (((metric)::text = 'participation'::text) AND ((status)::text = 'active'::text))
         ->  Index Scan using index_users_on_id_and_type on users u  (cost=0.42..8.45 rows=1 width=4) (actual time=0.024..0.024 rows=0 loops=9751)
               Index Cond: (id = ds.user_id)
               Filter: ((properties -> 'provider'::text) = 'twitter'::text)
               Rows Removed by Filter: 1
   SubPlan 1
     ->  Limit  (cost=29.83..29.84 rows=1 width=4) (actual time=2.953..2.954 rows=1 loops=2787)
           ->  Sort  (cost=29.83..29.84 rows=1 width=4) (actual time=2.951..2.951 rows=1 loops=2787)
                 Sort Key: ds2.id
                 Sort Method: top-N heapsort  Memory: 25kB
                 ->  Bitmap Heap Scan on daily_statistics ds2  (cost=25.80..29.82 rows=1 width=4) (actual time=2.381..2.702 rows=105 loops=2787)
                       Recheck Cond: ((user_id = u.id) AND (campaign_id = 4742))
                       Filter: (((metric)::text = 'participation'::text) AND ((status)::text = 'active'::text))
                       ->  BitmapAnd  (cost=25.80..25.80 rows=1 width=0) (actual time=2.365..2.365 rows=0 loops=2787)
                             ->  Bitmap Index Scan on index_daily_statistics_on_user_id  (cost=0.00..5.60 rows=156 width=0) (actual time=0.072..0.072 rows=292 loops=2787)
                                   Index Cond: (user_id = u.id)
                             ->  Bitmap Index Scan on index_daily_statistics_on_campaign_id_and_type  (cost=0.00..19.95 rows=453 width=0) (actual time=2.241..2.241 rows=9751 loops=2787)
                                   Index Cond: (campaign_id = 4742)
   SubPlan 2
     ->  Limit  (cost=29.83..29.84 rows=1 width=4) (actual time=2.879..2.880 rows=1 loops=2787)
           ->  Sort  (cost=29.83..29.84 rows=1 width=4) (actual time=2.876..2.876 rows=1 loops=2787)
                 Sort Key: ds2_1.id
                 Sort Method: top-N heapsort  Memory: 25kB
                 ->  Bitmap Heap Scan on daily_statistics ds2_1  (cost=25.80..29.82 rows=1 width=4) (actual time=2.241..2.585 rows=105 loops=2787)
                       Recheck Cond: ((user_id = u.id) AND (campaign_id = 4742))
                       Filter: (((metric)::text = 'participation'::text) AND ((status)::text = 'active'::text))
                       ->  BitmapAnd  (cost=25.80..25.80 rows=1 width=0) (actual time=2.222..2.222 rows=0 loops=2787)
                             ->  Bitmap Index Scan on index_daily_statistics_on_user_id  (cost=0.00..5.60 rows=156 width=0) (actual time=0.062..0.062 rows=292 loops=2787)
                                   Index Cond: (user_id = u.id)
                             ->  Bitmap Index Scan on index_daily_statistics_on_campaign_id_and_type  (cost=0.00..19.95 rows=453 width=0) (actual time=2.124..2.124 rows=9751 loops=2787)
                                   Index Cond: (campaign_id = 4742)
   SubPlan 3
     ->  Limit  (cost=29.83..29.84 rows=1 width=4) (actual time=3.030..3.030 rows=1 loops=2787)
           ->  Sort  (cost=29.83..29.84 rows=1 width=4) (actual time=3.018..3.018 rows=1 loops=2787)
                 Sort Key: ds2_2.id
                 Sort Method: top-N heapsort  Memory: 25kB
                 ->  Bitmap Heap Scan on daily_statistics ds2_2  (cost=25.80..29.82 rows=1 width=4) (actual time=2.407..2.755 rows=105 loops=2787)
                       Recheck Cond: ((user_id = u.id) AND (campaign_id = 4742))
                       Filter: (((metric)::text = 'participation'::text) AND ((status)::text = 'active'::text))
                       ->  BitmapAnd  (cost=25.80..25.80 rows=1 width=0) (actual time=2.390..2.390 rows=0 loops=2787)
                             ->  Bitmap Index Scan on index_daily_statistics_on_user_id  (cost=0.00..5.60 rows=156 width=0) (actual time=0.121..0.121 rows=292 loops=2787)
                                   Index Cond: (user_id = u.id)
                             ->  Bitmap Index Scan on index_daily_statistics_on_campaign_id_and_type  (cost=0.00..19.95 rows=453 width=0) (actual time=2.233..2.233 rows=9751 loops=2787)
                                   Index Cond: (campaign_id = 4742)
 Total runtime: 25158.063 ms
(49 rows)


foobar_production=> \d daily_statistics;
                                       Table "public.daily_statistics"
    Column     |            Type             |                           Modifiers
---------------+-----------------------------+---------------------------------------------------------------
 id            | integer                     | not null default nextval('daily_statistics_id_seq'::regclass)
 type          | character varying(255)      |
 metric        | character varying(255)      |
 campaign_id   | integer                     |
 user_id       | integer                     |
 country_id    | integer                     |
 attachment_id | integer                     |
 time_at       | timestamp without time zone |
 properties    | hstore                      |
 status        | character varying(255)      | default 'active'::character varying
Indexes:
    "daily_statistics_pkey" PRIMARY KEY, btree (id)
    "index_daily_statistics_on_attachment_id" btree (attachment_id)
    "index_daily_statistics_on_campaign_id_and_type" btree (campaign_id, type)
    "index_daily_statistics_on_country_id" btree (country_id)
    "index_daily_statistics_on_id" btree (id)
    "index_daily_statistics_on_metric" btree (metric)
    "index_daily_statistics_on_properties" gin (properties)
    "index_daily_statistics_on_status" btree (status)
    "index_daily_statistics_on_time_at" btree (time_at)
    "index_daily_statistics_on_user_id" btree (user_id)

想法将不胜感激。

最佳答案

这里似乎有 2 个部分:

第一个是获取用户的最新统计数据
另一个正在为用户累积所有 attachment_id。

两者都是针对特定类型的统计。由于您对 users 感兴趣，所以我会首先从他们开始。

使用此查询搜索最新条目:

SELECT u.id,
       ds.country_id,
       ds.time_at,
       ds.properties->'assumed_gender' AS assumed_gender
  FROM users u
  JOIN LATERAL (
      SELECT * FROM daily_statistics
       WHERE user_id=u.id
         AND campaign_id = 39
         AND metric = 'participation'
         AND status = 'active'
       ORDER BY id DESC LIMIT 1
    ) ds ON true
 WHERE u.properties -> 'provider' IN ('twitter');

我使用 LATERAL syntax在这里，这非常适合此类查询。

聚合不会从中受益，因此需要另一个子查询。

我得到了以下最终查询:

SELECT u.id,
       ds.country_id,
       ds.time_at,
       ds.properties->'assumed_gender' AS assumed_gender,
       g.attachment_ids
  FROM users u
  JOIN LATERAL (
    SELECT * FROM daily_statistics
     WHERE user_id=u.id
       AND campaign_id = 39
       AND metric = 'participation'
       AND status = 'active'
     ORDER BY id DESC LIMIT 1
    ) ds ON true
  JOIN (
    SELECT user_id, json_agg(to_json(attachment_id)) AS attachment_ids
      FROM daily_statistics
     WHERE campaign_id = 39
       AND metric = 'participation'
       AND status = 'active'
     GROUP BY user_id
    ) g ON g.user_id=u.id
 WHERE u.properties -> 'provider' IN ('twitter');

我假设，那个指数:

CREATE INDEX i_ds_campaign4status
 ON daily_statistics(campaign_id, user_id, id)
 WHERE status='active';

会有所帮助。不过，这取决于您的数据，如果您的所有统计数据都active，则删除WHERE 子句。

编辑:根据提供的计划，第二个查询受益于与聚合的连接，因为它减少了 LATERAL 部分的迭代次数。我会坚持这种方法。

关于sql - 我如何优化这个组明智的最大 sql 查询？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29748534/

文章推荐： php - MySQL PHP，表内表(每个条目多行)

文章推荐： sql - 哪个查询计划更快/更好

文章推荐： json - 将 epgsql 结果转换为 JSON

c++ - 为什么 MSVC 不为 char 或 const char* 优化 cout 而为 int 优化？
比较代码: const char x = 'a'; std::cout > (0C310B0h) 00C3100B add esp,4 和 const i
Matlab 优化
您好，我正在使用 Matlab 优化求解器，但程序有问题。我收到此消息 fmincon 已停止，因为目标函数值小于目标函数限制的默认值，并且约束满足在约束容差的默认值范围内。我也收到以下消息。警告:矩
Eclipse 优化
处理Visual Studio optimizations的问题为我节省了大量启动和使用它的时间当我必须进行 J2EE 开发时，我很难回到 Eclipse。因此，我还想知道人们是否有任何提示或技巧可
Excel 优化
情况如下:在我的 Excel 工作表中，有一列包含 1-name 形式的条目。考虑到数字也可以是两位数，我想删除这些数字。这本身不是问题，我让它工作了，只是性能太糟糕了。现在我的程序每个单元格输入大约
jQuery 优化
这样做有什么区别吗: $(".topHorzNavLink").click(function() { var theHoverContainer = $("#hoverContainer");
jQuery $(this) 优化
这个问题已经有答案了: 已关闭11 年前。 Possible Duplicate: What is the cost of '$(this)'? 我经常在一些开发人员代码中看到$(this)引用同一个
jQuery 优化
我刚刚结束了一个大型开发项目。我们的时间紧迫，因此很多优化被“推迟”。既然我们已经达到了最后期限，我们将回去尝试优化事情。我的问题是:优化 jQuery 网站时您要寻找的最重要的东西是什么。或者，我
JavaScript 优化
所以我一直在用 JavaScript 编写游戏(不是网络游戏，而是使用 JavaScript 恰好是脚本语言的游戏引擎)。不幸的是，游戏引擎的 JavaScript 引擎是 SpiderMonkey
MYSQL查询、优化
这是我在正在构建的页面中使用的 SQL 查询。它目前运行大约 8 秒并返回 12000 条记录，这是正确的，但我想知道您是否可以就如何使其更快提出可能的建议？ SELECT DISTINCT Adve
SQL 优化
如何优化这个？ SELECT e.attr_id, e.sku, a.value FROM product_attr AS e, product_attr_text AS a WHERE e.attr
python - 优化 `in`
我正在使用这样的结构来测试是否按下了所需的键: def eventFilter(self, tableView, event): if event.type() == QtCore.QEven
JavaScript 优化
我正在使用 JavaScript 从给定的球员列表中计算出羽毛球 double 比赛的所有组合。每个玩家都与其他人组队。 EG。如果我有以下球员a、b、c、d。它们的组合可以是: a & b V c
Javascript 优化
我似乎无法弄清楚如何让这个 JS 工作。 scroll function 起作用但不能隐藏。还有没有办法用更少的代码行来做到这一点？我希望 .down-arrow 在 50px 之后 fade out
CSS高级最小化(优化)
我的问题是关于用于生产的高级优化级联样式表 (CSS) 文件。多么最新和最完整(准备在实时元素中使用)的 css 优化器/最小化器，它们不仅提供删除空格和换行符，还提供高级功能，如删除过多的属性、合
HTML 优化
我读过这个: 浏览器检索在中请求的所有资源开始呈现之前的 HTML 部分.如果您将请求放在中section 而不是，那么页面呈现和下载资源可以并行发生。您应该从移动尽可能多的资源请求。
C++ 优化
我正在处理一些现有的 C++ 代码，这些代码看起来写得不好，而且调用频率很高。我想知道我是否应该花时间更改它，或者编译器是否已经在优化问题。我正在使用 Visual Studio 2008。这是一
c++ - 优化
我正在尝试使用 OpenGL 渲染 3 个四边形(1 个背景图，2 个 Sprite )。我有以下代码: void GLRenderer::onDrawObjects(long p_dt) {
Java If 优化
我确实有以下声明: isEnabled = false; if(foo(arg) && isEnabled) { .... } public boolean foo(arg) { some re
SQL 优化
(一)深入浅出理解索引结构实际上，您可以把索引理解为一种特殊的目录。微软的SQL SERVER提供了两种索引：聚集索引(clustered index，也称聚类索引、簇集索引)和非聚集索引(no
CSS 优化、提高性能的方法有哪些？
一、写在前面 css的优化方案，之前没有提及，所以接下来进行总结一下。二、具体优化方案 2.1、加载性能 1、css压缩：将写好的css进行打包，可以减少很多的体积。 2、css单一样式：在需要下边

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

sql - 我如何优化这个组明智的最大 sql 查询？