gpt4 book ai didi

sql - 每组 LIMIT - Google BigQuery/标准 SQL

转载 作者:行者123 更新时间:2023-12-05 08:52:31 24 4
gpt4 key购买 nike

我有一个如下表(示例 here ):

CREATE TABLE topics (
name varchar(64),
url varchar(253),
statistic integer,
pubdate timestamp
);

INSERT INTO topics VALUES
('a', 'b', 100, TIMESTAMP '2011-05-16 15:36:38'),
('a', 'c', 110, TIMESTAMP '2014-04-01 00:00:00'),
('a', 'd', 120, TIMESTAMP '2014-04-01 00:00:00'),
('a', 'e', 90, TIMESTAMP '2011-05-16 15:36:38'),
('a', 'f', 80, TIMESTAMP '2014-04-01 00:00:00'),
('a', 'g', 70, TIMESTAMP '2011-05-16 15:36:38'),
('a', 'h', 150, TIMESTAMP '2014-04-01 00:00:00'),
('a', 'i', 50, TIMESTAMP '2011-05-16 15:36:38'),
('b', 'j', 10, TIMESTAMP '2014-04-01 00:00:00'),
('b', 'k', 11, TIMESTAMP '2011-05-16 15:36:38'),
('b', 'l', 12, TIMESTAMP '2014-04-01 00:00:00'),
('b', 'm', 9, TIMESTAMP '2011-05-16 15:36:38'),
('b', 'n', 8, TIMESTAMP '2014-04-01 00:00:00'),
('b', 'o', 7, TIMESTAMP '2011-05-16 15:36:38'),
('b', 'p', 15, TIMESTAMP '2014-04-01 00:00:00'),
('b', 'q', 5, TIMESTAMP '2011-05-16 15:36:38'),
('b', 'r', 2, TIMESTAMP '2014-04-01 00:00:00')

我想根据每个 (name, date(pubdate)) 组合的 statistic 值来获取前两行。

换句话说,我想GROUP BY name, date(pubdate),但没有聚合函数,结果只是根据它们的获取前两行每组的统计。 (所以,我知道它并不是真正的 GROUP BY,而是 greatest-n-per-group。)

我正在使用带有标准 SQL 的 Google Big Query。我看过其他一些 solutions但我不确定在这种情况下如何实现结果。

期望的结果:

name    url     statistic   date

a b 100 2011-05-16
a e 90 2011-05-16

a h 150 2014-04-01
a d 120 2014-04-01

b m 9 2011-05-16
b k 11 2011-05-16

b l 12 2014-04-01
b p 15 2014-04-01

最佳答案

以下是 BigQuery 标准 SQL

#standardSQL
SELECT * EXCEPT(arr) FROM (
SELECT name, DATE(pubdate) day,
ARRAY_AGG(STRUCT(url, statistic) ORDER BY statistic DESC LIMIT 2) arr
FROM `project.dataset.table`
GROUP BY name, day
), UNNEST(arr)
-- ORDER BY name, day

您可以使用问题中的样本数据进行测试,如以下示例所示

#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' name, 'b' url, 100 statistic, TIMESTAMP '2011-05-16 15:36:38' pubdate UNION ALL
SELECT 'a', 'c', 110, '2014-04-01 00:00:00' UNION ALL
SELECT 'a', 'd', 120, '2014-04-01 00:00:00' UNION ALL
SELECT 'a', 'e', 90, '2011-05-16 15:36:38' UNION ALL
SELECT 'a', 'f', 80, '2014-04-01 00:00:00' UNION ALL
SELECT 'a', 'g', 70, '2011-05-16 15:36:38' UNION ALL
SELECT 'a', 'h', 150, '2014-04-01 00:00:00' UNION ALL
SELECT 'a', 'i', 50, '2011-05-16 15:36:38' UNION ALL
SELECT 'b', 'j', 10, '2014-04-01 00:00:00' UNION ALL
SELECT 'b', 'k', 11, '2011-05-16 15:36:38' UNION ALL
SELECT 'b', 'l', 12, '2014-04-01 00:00:00' UNION ALL
SELECT 'b', 'm', 9, '2011-05-16 15:36:38' UNION ALL
SELECT 'b', 'n', 8, '2014-04-01 00:00:00' UNION ALL
SELECT 'b', 'o', 7, '2011-05-16 15:36:38' UNION ALL
SELECT 'b', 'p', 15, '2014-04-01 00:00:00' UNION ALL
SELECT 'b', 'q', 5, '2011-05-16 15:36:38' UNION ALL
SELECT 'b', 'r', 2, '2014-04-01 00:00:00'
)
SELECT * EXCEPT(arr) FROM (
SELECT name, DATE(pubdate) day,
ARRAY_AGG(STRUCT(url, statistic) ORDER BY statistic DESC LIMIT 2) arr
FROM `project.dataset.table`
GROUP BY name, day
), UNNEST(arr)
ORDER BY name, day

结果

Row name    day         url statistic    
1 a 2011-05-16 b 100
2 a 2011-05-16 e 90
3 a 2014-04-01 h 150
4 a 2014-04-01 d 120
5 b 2011-05-16 k 11
6 b 2011-05-16 m 9
7 b 2014-04-01 p 15
8 b 2014-04-01 l 12

关于sql - 每组 LIMIT - Google BigQuery/标准 SQL,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56602480/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com