gpt4 book ai didi

sql - 在窗口中选择第一个和最后一个日期

转载 作者:行者123 更新时间:2023-11-29 12:45:13 26 4
gpt4 key购买 nike

我正在尝试根据提供日期的月份和年份在窗口中选择第一个和最后一个日期。

这是示例数据:

F.rates
| id | c_id | date | rate |
---------------------------------
| 1 | 1 | 01-01-1991 | 1 |
| 1 | 1 | 15-01-1991 | 0.5 |
| 1 | 1 | 30-01-1991 | 2 |
.................................
| 1 | 1 | 01-11-2014 | 1 |
| 1 | 1 | 15-11-2014 | 0.5 |
| 1 | 1 | 30-11-2014 | 2 |

这是我想出的 pgSQL SELECT:

SELECT c_id, first_value(date) OVER w, last_value(date) OVER w FROM F.rates 
WINDOW w AS (PARTITION BY EXTRACT(YEAR FROM date), EXTRACT(MONTH FROM date), c_id
ORDER BY date ASC)

这给了我一个非常接近我想要的结果:

| c_id | first_date | last_date  |
----------------------------------
| 1 | 01-01-1991 | 15-01-1991 |
| 1 | 01-01-1991 | 30-01-1991 |
.................................

应该是:

| c_id | first_date | last_date  |
----------------------------------
| 1 | 01-01-1991 | 30-01-1991 |
.................................

由于某些原因,last_value(date) 返回窗口中的每条记录。这让我觉得我误解了 SQL 中的窗口是如何工作的。这就像 SQL 为其遍历的每一行形成一个新窗口,而不是基于 YEAR 和 MONTH 为整个表形成多个窗口。

如果我错了,有人可以解释一下吗?我该如何实现我想要的结果?

我没有在 GROUP BY 子句上使用 MAX/MIN 是有原因的。我的下一步是检索我选择的日期的相关费率,例如:

| c_id | first_date | last_date  | first_rate | last_rate  | avg rate |
-----------------------------------------------------------------------
| 1 | 01-01-1991 | 30-01-1991 | 1 | 2 | 1.1 |
.......................................................................

最佳答案

如果您希望输出分组到单个(或更少)行中,您应该使用简单的聚合(即 GROUP BY),如果 avg_rate 足够了:

SELECT c_id, min(date), max(date), avg(rate)
FROM F.rates
GROUP BY c_id, date_trunc('month', date)

有关窗口函数的更多信息,请参阅 PostgreSQL's documentation :

But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities.

...

There is another important concept associated with window functions: for each row, there is a set of rows within its partition called its window frame. Many (but not all) window functions act only on the rows of the window frame, rather than of the whole partition. By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. When ORDER BY is omitted the default frame consists of all rows in the partition.

...

There are options to define the window frame in other ways ... See Section 4.2.8 for details.

编辑:

如果您想折叠(最小/最大聚合)您的数据并希望收集比 GROUP BY 中列出的列更多的列,您有 2 个选择:

SQL方式

在子查询中选择最小值/最大值,然后将它们的原始行连接回去(但是这样,您必须处理这样一个事实,即最小值/最大值列通常不是唯一的) :

SELECT c_id,
min first_date,
max last_date,
first.rate first_rate,
last.rate last_rate,
avg avg_rate
FROM (SELECT c_id, min(date), max(date), avg(rate)
FROM F.rates
GROUP BY c_id, date_trunc('month', date)) agg
JOIN F.rates first ON agg.c_id = first.c_id AND agg.min = first.date
JOIN F.rates last ON agg.c_id = last.c_id AND agg.max = last.date

PostgreSQL 的DISTINCT ON

DISTINCT ON通常用于此任务,但高度依赖于排序(一次只能以这种方式搜索 1 个极值):

SELECT   DISTINCT ON (c_id, date_trunc('month', date))
c_id,
date first_date,
rate first_rate
FROM F.rates
ORDER BY c_id, date

您可以将此查询与 F.rates 的其他聚合子查询结合起来,但是这一点(如果您确实需要最小值和最大值,在您的情况下甚至需要平均值)符合 SQL方式更合适。

关于sql - 在窗口中选择第一个和最后一个日期,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27231299/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com