Wednesday, December 3, 2008

A Trap in SAS Macro Programming

When using SAS macro, beginners will easily fall into this trap if they do not understand that SAS macro is somewhat just a text substitution.

Look at this example. There is a database of 10 persons created using this program :
option mprint source2 noovp ;

data person ;
     length name  $20. ;
     length sex   $1.  ;

     name= 'Angelina'  ; sex= 'F' ; output ;
     name= 'Babara'    ; sex= 'F' ; output ;
     name= 'Clara'     ; sex= 'F' ; output ;
     name= 'Diana'     ; sex= 'F' ; output ;
     name= 'Eva'       ; sex= 'F' ; output ;
     name= 'Frankie'   ; sex= 'M' ; output ;
     name= 'George'    ; sex= 'M' ; output ;
     name= 'Henry'     ; sex= 'M' ; output ;
     name= 'Ivan'      ; sex= 'M' ; output ;
     name= 'John'      ; sex= 'M' ; output ;
run;
After execution, the table person contains these records:
Obs    name        sex

  1    Angelina     F
  2    Babara       F
  3    Clara        F
  4    Diana        F
  5    Eva          F
  6    Frankie      M
  7    George       M
  8    Henry        M
  9    Ivan         M
 10    John         M

Then, three more fields (initial, wear_tie and wear_bra) are added to the table. Records of male and female are with different initialization. The initial is 'Mr.' for all males and 'Miss' for all females. All male records will have wear_tie=Y and wear_bra=N whereas all female records are just the opposite (i.e. wear_tie=N and wear_bra=Y).

For the first thought, here is the program using macro for the initialization :
%macro init_male ;
       initial=  'Mr.' ;
       wear_tie= 'Y'   ;
       wear_bra= 'N'   ;
%mend;

%macro init_female ;
       initial=  'Miss' ;
       wear_tie= 'N'    ;
       wear_bra= 'Y'    ;
%mend;

data person_1 ;
     set person ;
     length initial  $4. 
            wear_tie $1.
            wear_bra $1.     
            ;
     if sex = 'M' then %init_male   ;
     if sex = 'F' then %init_female ;
run;

proc print data=person_1 ;
quit;

The program design seems very straight forward. The macro init_male is for male record initialization and the macro init_female is for all female record initialization.

But, the outcome is not what we expected. Here is the output table person_1 :
Obs    name        sex    initial   wear_tie   wear_bra

  1    Angelina     F      Miss        N          Y
  2    Babara       F      Miss        N          Y
  3    Clara        F      Miss        N          Y
  4    Diana        F      Miss        N          Y
  5    Eva          F      Miss        N          Y
  6    Frankie      M      Mr.         N          Y
  7    George       M      Mr.         N          Y
  8    Henry        M      Mr.         N          Y
  9    Ivan         M      Mr.         N          Y
 10    John         M      Mr.         N          Y
The fields wear_tie and wear_bra are initialized incorrectly for all male records. The male records should have wear_tie=Y and wear_bra=N, but they are not.
How about re-arranging the 2 macro lines like this:
data person_2 ;
     set person ;
     length initial  $4. 
            wear_tie $1.
            wear_bra $1.     
            ;
     if sex = 'F' then %init_female ;
     if sex = 'M' then %init_male   ;
run;

proc print data=person_2 ;
quit;
Now, the output table person_2 looks:
Obs    name        sex    initial   wear_tie   wear_bra

  1    Angelina     F      Miss        Y          N
  2    Babara       F      Miss        Y          N
  3    Clara        F      Miss        Y          N
  4    Diana        F      Miss        Y          N
  5    Eva          F      Miss        Y          N
  6    Frankie      M      Mr.         Y          N
  7    George       M      Mr.         Y          N
  8    Henry        M      Mr.         Y          N
  9    Ivan         M      Mr.         Y          N
 10    John         M      Mr.         Y          N
This time, the result is wrong for all female records.

Actually, both programs have fall into the trap of SAS macro. The correct program should be written like this :
data person_3 ;
     set person ;
     length initial    $4. 
            wear_tie $1.
            wear_bra   $1.     
            ;
     if sex = 'M' then do; %init_male   ; end;
     if sex = 'F' then do; %init_female ; end;
run;

proc print data=person_3 ;
quit;
Then, the table person_3 will have these records :
Obs    name        sex    initial   wear_tie   wear_bra

  1    Angelina     F      Miss        N          Y
  2    Babara       F      Miss        N          Y
  3    Clara        F      Miss        N          Y
  4    Diana        F      Miss        N          Y
  5    Eva          F      Miss        N          Y
  6    Frankie      M      Mr.         Y          N
  7    George       M      Mr.         Y          N
  8    Henry        M      Mr.         Y          N
  9    Ivan         M      Mr.         Y          N
 10    John         M      Mr.         Y          N
The output result is exactly what we want.

Then, actually what is wrong with the programs for person_1 and person_2 ? This is due to SAS macro substitution. SAS will actually [ substitute ] the macro statements into the program. If you do not understand this [ substitution ] activity, you will fall into this trap.

So, for the program fragment for person_1 :
data person_1 ;
     set person ;
     length initial  $4. 
            wear_tie $1.
            wear_bra $1.     
            ;
     if sex = 'M' then %init_male   ;
     if sex = 'F' then %init_female ;
run;
, after substitution, the program fragment will become :
data person_1 ;
     set person ;
     length initial  $4. 
            wear_tie $1.
            wear_bra $1.     
            ;
     if sex = 'M' then initial= 'Mr.' ;
     wear_tie= 'Y' ;
     wear_bra= 'N' ;
     if sex = 'F' then initial= 'Miss' ;
     wear_tie= 'N' ;
     wear_bra= 'Y' ;
run;
(I have re-indent the coding for easy reading. The actual substitution text is quite difficult to read.)

As one can see, after substitution, only the [ initial ] field will work as what we expected. The fields wear_tie and wear_bra will be set regardless of the sex field. At the last 2 lines of the program, all the wear_tie will be N and wear_bra will be Y regardless of male or female record. Thus, the person_1 will have this not-expected output.

However, when calling a macro inside a pair of do-end statements, these 2 lines of coding :
if sex = 'M' then do; %init_male   ; end;
if sex = 'F' then do; %init_female ; end;
, after substitution, will become :
if sex = 'M' then do;
                    initial=  'Mr.' ;
                    wear_tie= 'Y'   ;
                    wear_bra= 'N'   ;
                  end;
if sex = 'F' then do;
                    initial=  'Miss' ;
                    wear_tie= 'N'    ;
                    wear_bra= 'Y'    ;
                  end;
(Again, I have re-indent the coding for easy reading.)

This is what we expected.

To avoid falling into this trap, always remember that SAS macro can be treated as text substitution.


Alvin SIU
2008-12-03
Copyright/Licence Information:
All information and coding in this article is offered at no charge for NON-COMMERCIAL PERSONAL USE only.
This blog and the coding is copyright.
Reproduction of this blog and its coding in whole or in part in paper or digitally or in any other forms without the explicit written permission of the author is strictly prohibited.

Disclaimer:
All information in this article is distributed "as is" and is UNSUPPORTED.
NO WARRANTY of any kind is expressed or implied.
You use AT YOUR OWN RISK.
The author will not be liable for any data loss, damages, and loss of profits or any other kind of tangible or intangible loss while using or misusing wholly or partly of the information.

No comments:

Duplicate Open Current Folder in a New Window

Sometimes after I opened a folder in Win7, I would like to duplicate open the same folder again in another explorer window. Then, I can ope...