I am trying to edit variables in a sas dataset. The dataset has multiple types of variables which repeat for quite a few years, each month. The variables are character variables and have names like 'X_1999_01'. The basic idea is that I want to compare 1-2 dates (date1 and date2 below) that each observation might have with the implied dates represented by the variable name (basically the year and month in the name and assuming the first day of the month - 'refdate' below) and adjust the variable if one or more inequalities/equalities between the dates hold (see code). I've created a simple toy dataset below with this structure (including missingness) and show the macro code I have tried. I simplify the data to 2 years and 3 months, but, irl brute force is not an option. If anyone has ideas how to make the macro work or can suggest an alternative approach, it would be much appreciated.
我正在尝试编辑SAS数据集中的变量。数据集有多种类型的变量,每个月都会重复几年。变量是字符变量,名称类似于‘X_1999_01’。基本思想是,我想要将每个观测可能具有的1-2个日期(下面的日期1和日期2)与变量名称所表示的隐含日期(基本上是名称中的年和月,并假设月份的第一天-下面的‘refdate’)进行比较,如果日期之间存在一个或多个不等/相等(请参见代码),则调整变量。下面我用这个结构创建了一个简单的玩具数据集(包括缺失),并显示了我尝试过的宏代码。我将数据简化为2年零3个月,但是,IRL暴力不是一种选择。如果有人有办法让宏观经济发挥作用,或者能提出替代方案,我们将不胜感激。
data toydata ;
input var1999_01 $ var1999_02 $ var1999_03 $ var2000_01 $
var2000_02 $ var2000_03 $ date1 : date9. date2 : date9. ;
format date1 date2 yymmddn8. ;
datalines ;
. . . no yes yes 01FEB2000 .
yes yes yes no no no 01JUL1998 01JAN2000
;
run ;
/* to see the data */
proc print data=toydata ; run ;
proc contents data=toydata ; run ;
%macro failed ;
data wantdata ;
set toydata ;
%do i=1999 %to 2000 ;
%let year = &i. ;
array var&year. [3] $ var&year._01-var&year._03 ;
%do j=1 %to 3 ;
/*the macro variable refdate is defined based
on the year (i loop) and month (j loop) in the
variable name, assuming 1st day of the month. */
%let refdate = %sysfunc(mdy(&j., 1, &year.)) ;
/* first condition is whether date1 (a date) is
greater than (i.e., after) the reference date
implied by the variable name, e.g., var1999_01
implies 01Jan1999. */
%if date1 > &refdate. %then var&year.[&j.] = 'nye' ;
/* the second condition is whether date 2 (if
not missing) is equal to implied date and
then if it's less than (before) the implied
date */
%if %length(date2) %then %do ;
%if date2 = &refdate. %then var&year.[&j.] = 'event this month' ;
%if date2 < &refdate. %then var&year.[&j.] = 'past event' ;
%end ;
%end ;
%end ;
run ;
%mend ;
%failed
更多回答
I really cannot tell from your description and code what the "condition" is that you are testing.
我真的不能从您的描述和代码中看出您正在测试的“条件”是什么。
Apologies. You are correct. My explanation was confused. I have edited for clarity.
很抱歉。你说得对。我的解释很混乱。为了清楚起见,我对此进行了编辑。
I am sorry I still do not see what condition you are testing. Are you not just trying to access the value of the variable that is implied by the value in the DATE variable? Once you have the value of that implied variable what do you want to do with it? Are planning to read from that variable or write to it? If write what do you want to write? If read what do want to do with the value?
很抱歉,我仍然不知道您正在测试的是什么情况。您不是在尝试访问DATE变量中的值所隐含的变量值吗?一旦你得到了这个隐含变量的值,你想用它做什么?是否计划从该变量中读取或写入该变量?如果要写,你想写什么?如果读了,你想用这个值做什么?
Note this code makes no sense %if date1 > &refdate.
You are asking the macro processor to compare the date value you put into REFDATE
with the string date1
. The macro process cannot see the value of the dataset variable named DATE1 so to it the string date1 is just a string of 5 characters.
注意:如果Date1>&refdate,则此代码没有任何意义。您要求宏处理器将您输入REFDATE的日期值与字符串Date1进行比较。宏进程看不到名为Date1的数据集变量的值,因此对它来说,字符串Date1只是一个5个字符的字符串。
So for each variable of the form var_year_month, e.g., var1999_01, I want to test whether the value of another variable - date1 (a date) - is greater than the date implied by the name, e.g., 1999-01-01 (assuming first of the month). So is date1 > 1999-01-01? If so, I want to change the (character value) of for example var1999_01 (which may be missing) to 'nye' which is short for not yet eligible, but that doesn't really matter. If not, I don't want to change value of variable var1999_01. Similarly for the second set of conditions, I am interested in comparing the variable date2 (also a date).
因此,对于var_Year_Month形式的每个变量,例如var1999_01,我想测试另一个变量-Date1(日期)-的值是否大于名称所暗示的日期,例如1999-01-01(假设是该月的第一个月)。那么,日期1>1999-01-01吗?如果是这样的话,我想要将例如var1999_01(可能缺失)的(字符值)更改为‘Nye’,这是尚未合格的缩写,但这并不重要。如果不是,我不想更改变量var1999_01的值。类似地,对于第二组条件,我感兴趣的是比较变量Date2(也是日期)。
To dynamically access a particular variable from a list of variables in a data step you define an array and index into the array.
要在数据步骤中动态访问变量列表中的特定变量,需要定义一个数组并索引到该数组中。
Since the suffix on the variable names seems to indicate YEAR and MONTH just use INTCK() function to calculate which index to use into the array.
由于变量名上的后缀似乎表示年和月,因此只需使用INTCK()函数来计算要在数组中使用哪个索引。
I cannot tell what you are actually trying to do so let's just make something that captures the value of the EVENT variables (the ones with the month encoded into their names) based on month of the current DATE variable being checked.
我不知道您实际上在尝试做什么,所以让我们只做一些事情来捕获事件变量的值(那些将月份编码到它们的名称中的变量),基于被检查的当前日期变量的月份。
%let basedate='01JAN1999'd;
data want;
set have ;
array events $3 var1999_01-var1999_12 var2000_01-var2000_12 ;
array dates date1-date3 ;
array event_this_month [3] $3 ;
do index=1 to dim(dates);
month = 1 + intck('month',&basedate,dates[index]);
if 1<=month<=dim(events) then
event_this_month[index] = events[month]
;
end;
run;
So if DATE1 is in the month DEC_1999 then the value of EVENT_THIS_MONTH1 will be the value of VAR1999_12.
因此,如果Date1在DEC_1999月份,则EVENT_THIS_MONTH1的值将是VAR1999_12的值。
If instead you mean to use the two date variables to indicate a range of months then perhaps you want to run the index month over the set of values implied by those two dates?
如果您想要使用两个日期变量来指示月份的范围,那么您可能希望在这两个日期所隐含的一组值上运行索引Month?
So to set all of the variables from DATE1 to DATE2 to the value 'yes' you could use a loop like this:
因此,要将从Date1到DATE2的所有变量设置为值‘yes’,您可以使用如下循环:
%let basedate='01JAN1999'd;
data want;
set have ;
array events $3 var1999_01-var1999_12 var2000_01-var2000_12 ;
do month=max(1,intck('month',&basedate,date1))
to min(dim(events),intck('month',&basedate,date2))
;
events[month] = 'yes';
end;
run;
So in the comments you seem to say that you want to take the index based on the MIN of DATE1 and DATE2 and set the array entries AFTER that to 'nye'.
Given your example data perhaps it is more likely you meant to start from the MAX or the two dates?
因此,在注释中,您似乎要根据Date1和DATE2的最小值获取索引,并将其后的数组条目设置为‘Nye’。给出您的示例数据,可能更有可能是从最大日期或两个日期开始?
Let's try it, first lets fix your example data to have contiguous month variables.
让我们试一试,首先让我们将您的示例数据修复为具有连续的月份变量。
data have ;
format date1 date2 yymmdd10. ;
array events $3 var1999_10-var1999_12 var2000_01-var2000_03;
input var1999_10--var2000_03 date1 :date. date2 :date.;
datalines ;
. . . no yes yes 01FEB2000 .
yes yes yes no no no 01JUL1998 01JAN2000
;
Now the base date for the array of date based variables is 01OCT1999
现在,基于日期的变量数组的基准日期为01OCT1999
%let basedate='01OCT1999'd;
data want;
set have ;
array events $3 var1999_10-var1999_12 var2000_01-var2000_03;
do month=max(1,1+intck('month',&basedate,max(date1,date2))) to dim(events);
events[month] = 'nye';
end;
drop month;
run;
Results
结果
I'm struggling to understand your intent also. But usually when you have data values in your variable names (implied dates in this case), that's a sign that life will be easier if you transpose your data into a vertical format. Below takes your data (I added an ID column just for readability), and transposes each row into 6 rows (one per month). Then you should be able to easily do any calculations you want between Date, Date1, and Date2. And you can always transpose back to the wide format if you need that in the end.
我也在努力理解你的意图。但通常当您的变量名中有数据值时(在本例中为隐含日期),这是一个迹象,表明如果您将数据转换为垂直格式,生活将会更容易。下面获取您的数据(我添加了一个ID列只是为了可读性),并将每一行转置为6行(每月一行)。然后,您应该能够轻松地在Date、Date1和Date2之间进行任何计算。如果最后需要的话,你可以随时调换回宽格式。
data toydata ;
input ID var1999_01 $ var1999_02 $ var1999_03 $ var2000_01 $
var2000_02 $ var2000_03 $ date1 : date9. date2 : date9. ;
format date1 date2 yymmddn8. ;
datalines ;
1 . . . no yes yes 01FEB2000 .
2 yes yes yes no no no 01JUL1998 01JAN2000
;
run ;
data vert (keep=ID Date Value Date1 Date2);
set toydata ;
array vars{*} var: ;
do i=1 to dim(vars) ;
Date=input(compress(vname(vars{i}),'var_'),yymmn6.) ;
Value=vars{i} ;
output ;
end ;
format Date: date9. ;
run ;
proc print ;
var Date Value Date1 Date2 ;
by id ;
id id ;
run ;
Returns:
返回:
ID Date Value date1 date
1 01JAN1999 01FEB2000
01FEB1999 01FEB2000
01MAR1999 01FEB2000
01JAN2000 no 01FEB2000
01FEB2000 yes 01FEB2000
01MAR2000 yes 01FEB2000
2 01JAN1999 yes 01JUL1998 01JAN2000
01FEB1999 yes 01JUL1998 01JAN2000
01MAR1999 yes 01JUL1998 01JAN2000
01JAN2000 no 01JUL1998 01JAN2000
01FEB2000 no 01JUL1998 01JAN2000
01MAR2000 no 01JUL1998 01JAN2000
更多回答
我是一名优秀的程序员,十分优秀!