gpt4 book ai didi

sql - 用于选择 Redshift Postgres 中多数行名称的窗口函数

转载 作者:行者123 更新时间:2023-12-04 20:56:10 24 4
gpt4 key购买 nike

我有一个这样的数据集,其中一些行有用,但已损坏。

create table pages (
page varchar,
cat varchar,
hits int
);

insert into pages values
(1, 'asdf', 1),
(1, 'fdsa', 2),
(1, 'Apples', 321),
(2, 'gwegr', 30),
(2, 'hsgsdf', 2),
(2, 'Bananas', 321);

我想知道每个页面的正确类别和总点击数。正确的类别是点击次数最多的类别。我想要一个像这样的数据集:

page | category | sum_of_hits
-----------------------------
1 | Apples | 324
2 | Bananas | 353

我能得到的最远的是:

SELECT page,
last_value(cat) over (partition BY page ORDER BY hits) as category,
sum(hits) as sum_of_hits
FROM pages
GROUP BY 1, 2

但它是错误的:错误:列“pages.hits”必须出现在 GROUP BY 子句中或用于聚合函数位置:83

我尝试将点击量放在一个聚合中 - ORDER BY max(hits) 但这没有意义,也不是我想要的。

fiddle :http://sqlfiddle.com/#!17/cb3c2/17

最佳答案

这里有两个问题:

首先是last_value的用法。阅读 Postgres documentation 中的注释关于最后一个值:

Note that first_value, last_value, and nth_value consider only the rows within the "window frame", which by default contains the rows from the start of the partition through the last peer of the current row. This is likely to give unhelpful results for nth_value and particularly last_value. You can redefine the frame as being the whole partition by adding ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to the OVER clause. See Section 4.2.8 for more information.

我建议您将其转换为 first_value:

SELECT page,
first_value(cat) over (partition BY page ORDER BY hits DESC) as category,
hits
FROM pages

第二个问题是不能直接在GROUP BY子句中使用窗口函数,需要使用子查询或者cte:

select page, category,
sum(hits)
from (
SELECT page,
first_value(cat) over (partition BY page ORDER BY hits DESC) as category,
hits
FROM pages
) a
GROUP BY 1, 2

SQL Fiddle

关于sql - 用于选择 Redshift Postgres 中多数行名称的窗口函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46534658/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com