gpt4 book ai didi

sql - 递归 SQL 语句(PostgreSQL 9.1.4)

转载 作者:行者123 更新时间:2023-11-29 11:16:20 31 4
gpt4 key购买 nike

PostgreSQL 9.1

业务情况

每个月,都会有一批新的帐户分配给特定的流程。每批可以按月、账户数和账户总余额进行描述。该过程的目标是从客户那里收回部分余额。 然后每月单独跟踪每个批处理(自批处理转移到流程后每月回收的金额)。

目标

我的目标是预测 future 会收回多少金额。

数据定义

create table vintage_data (
granularity date, /* Month when account entered process*/
distance_in_months integer, /* Distance in months from date when accounts entered process*/
entry_accounts integer, /* Number of accounts that entered process in a given month*/
entry_amount numeric, /* Total amount for account that entered process in a given month*/
recovery_amount numeric /* Amount recovered in Nth month on accounts that entered process in a given month */
);

示例数据

insert into vintage_data values('2012-01-31',1,200,100000,1000);
insert into vintage_data values('2012-01-31',2,200,100000,2000);
insert into vintage_data values('2012-01-31',3,200,100000,3000);
insert into vintage_data values('2012-01-31',4,200,100000,3500);
insert into vintage_data values('2012-01-31',5,200,100000,3400);
insert into vintage_data values('2012-01-31',6,200,100000,3300);
insert into vintage_data values('2012-02-28',1,250,150000,1200);
insert into vintage_data values('2012-02-28',2,250,150000,1600);
insert into vintage_data values('2012-02-28',3,250,150000,1800);
insert into vintage_data values('2012-02-28',4,250,150000,1200);
insert into vintage_data values('2012-02-28',5,250,150000,1600);
insert into vintage_data values('2012-03-31',1,200,90000,1300);
insert into vintage_data values('2012-03-31',2,200,90000,1200);
insert into vintage_data values('2012-03-31',3,200,90000,1400);
insert into vintage_data values('2012-03-31',4,200,90000,1000);
insert into vintage_data values('2012-04-30',1,300,180000,1600);
insert into vintage_data values('2012-04-30',2,300,180000,1500);
insert into vintage_data values('2012-04-30',3,300,180000,4000);
insert into vintage_data values('2012-05-31',1,400,225000,2200);
insert into vintage_data values('2012-05-31',2,400,225000,6000);
insert into vintage_data values('2012-06-30',1,100,60000,1000);

计算过程

你可以把数据想象成一个三角矩阵(要预测X值):

distance_in_months                       1      2     3       4      5      6
granularity entry_accounts entry_amount
2012-01-31 200 100000 1000 2000 3000 3500 3400 3300
2012-02-28 250 150000 1200 1600 1800 1200 1600 (X-1)
2012-03-31 200 90000 1300 1200 1400 1000 (X0) (X4)
2012-04-30 300 180000 1600 1500 4000 (X1) (X5) (X8)
2012-05-31 400 225000 2200 6000 (X2) (X6) (X9) (X11)
2012-06-30 100 60000 1000 (X3) (X7) (X10) (X12 (X13)

算法

我的目标是预测所有缺失点( future )。为了说明这个过程,这是对点X1的计算

1) 使用最多 4 的距离获取前三个月的行总计:

2012-01-31  1000+2000+3000+3500=9500 (d4m3)
2012-02-28 1200+1600+1800+1200=5800 (d4m2)
2012-03-31 1300+1200+1400+1000=4900 (d4m1)

2) 使用最多 3 的距离获取前三个月的行总计:

2012-01-31  1000+2000+3000=6000 (d3m3)
2012-02-28 1200+1600+1800=4600 (d3m2)
2012-03-31 1300+1200+1400=3800 (d3m1)

3) 计算距离 3 和距离 4 的加权平均运行率(由 entry_amount 加权):

(d4m3+d4m2+d4m1)/(100000+150000+90000) = (9500+5800+4900)/(100000+150000+90000) = 20200/340000 = 0.0594
(d3m3+d3m2+d3m1)/(100000+150000+90000) = (6000+4600+3800)/(100000+150000+90000) = 14400/340000 = 0.0424

4)计算距离3和距离4的变化

((d4m3+d4m2+d4m1)/(100000+150000+90000))/((d3m3+d3m2+d3m1)/(100000+150000+90000)) =
= (20200/340000)/(14400/340000) =
= 0.0594/0.0424 = 1.403 (PredictionRateForX1)

5) 使用最大为 3 的距离计算预测月份的行总数:

2012-04-30  1600+1500+4000=7100

6) 使用预测月份的 entry_amount 计算费率

7100/180000 = 0.0394

7) 计算 X1 的预测速率

0.0394 * PredictionRateForX1 = 0.05534

8) 计算X1的金额

(0.05534-0.0394)*180000 = 2869.2

问题

问题是如何使用 SQL 语句计算矩阵的其余部分(从 x-1 到 x13)。很明显,这将需要某种递归算法。

最佳答案

这是一项艰巨的任务,将其拆分以使其更易于管理。我会将其放入带有 RETURN TABLE 的 plpgsql 函数中:

  1. 使用交叉表查询为您的“计算过程”矩阵创建一个临时表你需要 tablefunc为此安装的模块。运行(每个数据库一次):

    CREATE EXTENSION tablefunc;
  2. 按字段更新临时表。

  3. 返回表格。

以下演示功能齐全,并使用 PostgreSQL 9.1.4 进行了测试。基于问题中提供的表定义:

-- DROP FUNCTION f_forcast();

CREATE OR REPLACE FUNCTION f_forcast()
RETURNS TABLE (
granularity date
,entry_accounts numeric
,entry_amount numeric
,d1 numeric
,d2 numeric
,d3 numeric
,d4 numeric
,d5 numeric
,d6 numeric) AS
$BODY$
BEGIN

--== Create temp table with result of crosstab() ==--

CREATE TEMP TABLE matrix ON COMMIT DROP AS
SELECT *
FROM crosstab (
'SELECT granularity, entry_accounts, entry_amount
,distance_in_months, recovery_amount
FROM vintage_data
ORDER BY 1, 2',

'SELECT DISTINCT distance_in_months
FROM vintage_data
ORDER BY 1')
AS tbl (
granularity date
,entry_accounts numeric
,entry_amount numeric
,d1 numeric
,d2 numeric
,d3 numeric
,d4 numeric
,d5 numeric
,d6 numeric
);

ANALYZE matrix; -- update statistics to help calculations


--== Calculations ==--

-- I implemented the first calculation for X1 and leave the rest to you.
-- Can probably be generalized in a loop or even a single statement.

UPDATE matrix m
SET d4 = (
SELECT (sum(x.d1) + sum(x.d2) + sum(x.d3) + sum(x.d4))
/(sum(x.d1) + sum(x.d2) + sum(x.d3)) - 1
-- removed redundant sum(entry_amount) from equation
FROM (
SELECT *
FROM matrix a
WHERE a.granularity < m.granularity
ORDER BY a.granularity DESC
LIMIT 3
) x
) * (m.d1 + m.d2 + m.d3)
WHERE m.granularity = '2012-04-30';

--- Next update X2 ..


--== Return results ==--

RETURN QUERY
TABLE matrix
ORDER BY 1;

END;
$BODY$ LANGUAGE plpgsql;

调用:

SELECT * FROM f_forcast();

我已经简化了很多,删除了计算中的一些冗余步骤。
该解决方案采用了多种先进技术。您需要熟悉 PostgreSQL 才能使用它。

关于sql - 递归 SQL 语句(PostgreSQL 9.1.4),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11558328/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com