gpt4 book ai didi

sas - SAS 或 PROC SQL 中按组的累积最大值

转载 作者:行者123 更新时间:2023-12-02 22:03:49 30 4
gpt4 key购买 nike

我想计算按另一列分组的累积最大值。

假设我有这些数据:

data have;
input grp $ number;
datalines;
a 3
b 4
a 5
b 2
a 1
b 8
;

我想要的输出是:

data want;
input grp $ cummax;
a 3
b 4
a 5
b 4
a 5
b 8
;

我的真实案例将涉及多个分组列+过滤器,理想情况下,该累积最大值将同时在多个列上计算。

我主要关心的是计算效率,因为我将在十到数亿行的表上运行它。 Proc SQL 或 native SAS 都受欢迎。

如有必要,行可能会被打乱。

系统信息

proc product_status;run;
For Base SAS Software ...
Custom version information: 9.3_M2
Image version information: 9.03.01M2P080112
For SAS/STAT ...
Custom version information: 12.1
Image version information: 9.03.01M0P081512
For SAS/GRAPH ...
Custom version information: 9.3_M2
For SAS/CONNECT ...
Custom version information: 9.3_M2
For SAS OLAP Server ...
Custom version information: 9.3_M1
For SAS Enterprise Miner ...
Custom version information: 12.1
Image version information: 9.03.01M0P081512
For SAS Integration Technologies ...
Custom version information: 9.3_M2
For SAS/ACCESS Interface to Oracle ...
Custom version information: 9.3_M1
For SAS/ACCESS Interface to PC Files ...
Custom version information: 9.3_M2
    proc setinit;run;
Product expiration dates:
---Base SAS Software 31JUL2018
---SAS/STAT 31JUL2018
---SAS/GRAPH 31JUL2018
---SAS/CONNECT 31JUL2018
---SAS OLAP Server 31JUL2018
---SAS Enterprise Miner 31JUL2018
---MDDB Server common products 31JUL2018
---SAS Integration Technologies 31JUL2018
---SAS Enterprise Miner Server 31JUL2018
---SAS Enterprise Miner Client 31JUL2018
---Unused OLAP Slot 31JUL2018
---SAS Enterprise Guide 31JUL2018
---SAS/ACCESS Interface to Oracle 31JUL2018
---SAS/ACCESS Interface to PC Files 31JUL2018
---SAS Metadata Bridges for Informatica 31JUL2018
---SAS Metadata Bridges for Microsoft SQL Server 31JUL2018
---SAS Metadata Bridge for Oracle 31JUL2018
---SAS Workspace Server for Local Access 31JUL2018
---SAS Workspace Server for Enterprise Access 31JUL2018
---SAS Table Server 31JUL2018
---DataFlux Trans DB Driver 31JUL2018
---SAS Framework Data Server 31JUL2018
---SAS Add-in for Microsoft Excel 31JUL2018
---SAS Add-in for Microsoft Outlook 31JUL2018
---SAS Add-in for Microsoft PowerPoint 31JUL2018
---SAS Add-in for Microsoft Word 31JUL2018

最佳答案

使用HASH对象来存储每个变量和组组合的最大值。这将允许您单次遍历数据集并编写一些可以根据组和变量数量进行扩展的代码。

这不需要对大型数据集进行昂贵的排序。

测试数据

data example;
format grp1-grp5 $1.;
array grp[5];
array val[5];
do rows=1 to 1000000;
do i=1 to 5;
r = ceil(ranuni(1)*5);
grp[i] = substr("ABCDE",r,1);
end;
do j=1 to 5;
val[j] = 10*rannor(1);
end;
output;
end;
keep grp: val:;
run;

计算累积最大值的数据步骤

data want;
set example;
array val[5];
array max[5];
if _n_ = 1 then do;
declare hash mx();
rc = mx.defineKey('grp1','grp2','grp3','grp4','grp5');
rc = mx.definedata('max1','max2','max3','max4','max5');
rc = mx.definedone();
end;

rc = mx.find();
/*No Max for this combination -- add it*/
if rc then do;
do i=1 to 5;
max[i] = val[i];
end;
end;

/*Update Max Values*/
do i=1 to 5;
if val[i] > max[i] then
max[i] = val[i];
end;

/*Update Hash*/
rc = mx.replace();

drop rc i;
n = _n_; /*This is for testing*/
run;

使用测试变量n,我们可以对保持原始顺序的组进行排序,看看它是否有效。 (提示,确实如此)。

proc sort data=want;
by grp: n;
run;

关于sas - SAS 或 PROC SQL 中按组的累积最大值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48648946/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com