gpt4 book ai didi

sql - SAS proc sql 返回 group by/order by 变量的重复值

转载 作者:行者123 更新时间:2023-12-04 11:27:34 30 4
gpt4 key购买 nike

我有一些相当简单的 SQL,它应该为每个 Assets 每季度提供 1 行。相反,我每组得到多行。

下面是 SQL、SAS 数据步骤和一些输出数据。重复行数(以下数据中为227708)等于Num_borrowers,即asset1的行数。

proc sql outobs=max;

create table table1 as
select
case
when period_dt ='01DEC2003'd then '2003Q4'
when period_dt ='01DEC2004'd then '2004Q4'
when period_dt ='01DEC2005'd then '2005Q4'
when period_dt ='01DEC2006'd then '2006Q4'
when period_dt ='01DEC2007'd then '2007Q4'
when period_dt ='01DEC2008'd then '2008Q4'
when period_dt ='01DEC2009'd then '2009Q4'
when period_dt ='01DEC2010'd then '2010Q4'
when period_dt ='01DEC2011'd then '2011Q4'
when period_dt ='01DEC2012'd then '2012Q4'
when period_dt ='01DEC2013'd then '2013Q4'
when period_dt ='01JUN2014'd then '2014Q2'
end as QTR,
case
when MM_ASSET in ('C&I', 'Foreign', 'Leasing','Scored-WF','Scored-WB') THEN 'C&I'
when MM_ASSET='Construction' THEN 'Construction RE'
when MM_ASSET='Mortgage-IP' THEN 'Income Producing RE'
when MM_ASSET='Mortgage-OO' THEN 'Owner Occupied RE'
when MM_ASSET='Mortgage-SF' THEN 'Mortgage-SF'
when MM_ASSET='Unknown' THEN 'Other'
end as asset1,
count (period_dt) as Num_Borrowers,
exposure,
co_itd,
MM_NINEQTR_LOSS,
MM_LIFE_LOSS
from td_prod.OBLIGOR_COMBINED
where period_dt in ('01DEC2003'd,'01DEC2004'd,'01DEC2005'd,'01DEC2006'd,'01DEC2007'd,'01DEC2008'd, '01DEC2009'd,'01DEC2010'd,'01DEC2011'd,'01DEC2012'd,'01DEC2013'd,'01JUN2014'd)
and mm_asset in ('C&I','Foreign','Leasing','Construction','Mortgage-IP','Scored-WF','Scored-WB'
'Mortgage-OO','Mortgage-SF','Unknown')
group by 1,2
order by 1,2;

quit;



data table2; set table1;

Total_Exposure = exposure/1000000;
if total_exposure = 0 then total_exposure=.;
Total_Charge_Offs =co_itd/1000000;
Total_9Q_Losses = MM_NINEQTR_LOSS/1000000;
Total_Life_Losses = MM_LIFE_LOSS/1000000;
avg_borrower_exp = total_exposure/num_borrowers;
co_rate = total_charge_offs/total_exposure;
life_lossR = Total_life_losses/total_exposure;
nineQtr_lossR = total_9q_losses/total_exposure;

run;



*** sample of output data set ***;
qtr asset1 num_borrowers
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708

最佳答案

意识到我上面的评论更像是一个答案。

在 SAS SQL 中,在带有 group by 子句的查询中,该子句包括 select 语句中的无关列(即不属于 group by 且不是从聚合函数派生的列),SAS 将汇总统计“重新合并”回原始数据(并附有说明)。大多数 SQL 只会抛出错误。下面是一个例子:

data have;
input gender $ age score;
cards;
M 10 100
M 20 200
F 30 300
F 40 400
;
run;

proc sql;
select gender, mean(age) as AvgAge, SCore
from have
group by gender
;
quit;

返回:
 gender      AvgAge     score
F 35 300
F 35 400
M 15 100
M 15 200

在您的代码中,exposure、co_itd、MM_NINEQTR_LOSS 和 MM_LIFE_LOSS 都是无关的列,导致 SAS 重新合并。

每当发生重新合并时,您都会在 SAS 日志中看到以下消息:

注意:查询需要重新合并摘要
统计回原来的
数据。

重新合并数据 SAS documentation on summary-function 中的部分更多细节

关于sql - SAS proc sql 返回 group by/order by 变量的重复值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25538392/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com