Monday, September 26, 2016

Hack 3.20 Saving Duplicate Observations Removed by PROC SORT

SAS Programming Professionals,

Did you know that you can have SAS store duplicate observations that are removed during a sort that uses the NODUPKEY option in a separate SAS data set?

The DUPOUT= option of PROC SORT specifies a SAS data set that is used to store observations eliminated from a sorted data set when the NODUPKEY option is employed.  SAS sorts the specified data set and each observation that has duplicate BY statement variable values is written to the DUPOUT= data set as it is eliminated from the sorted data set.  Here is an example:

proc sort nodupkey
      by sex age;

The SAS log for this example looks like this:

1    proc sort nodupkey data=sashelp.class out=sorted_class dupout=dupkeys;
2        by sex age;
3    run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.               
NOTE: SAS sort was used.
NOTE: 8 observations with duplicate key values were deleted.
NOTE: The data set WORK.SORTED_CLASS has 11 observations and 5 variables.
NOTE: The data set WORK.DUPKEYS has 8 observations and 5 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

Note that 8 observations with duplicate values of SEX and AGE were written to the DUPKEYS SAS data set.

This can come in very handy for QC-ing programs where you are eliminating duplicate-keyed observations during a sort.  You can double-check the observations in the DUPOUT= SAS data set and verify that the sort really got rid of observations that are not needed in the original sorted data set.

Best of luck in all your SAS endeavors!

(aka Michael A. Raithel)

Excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers

I plan to post each and every one of the hacks in the book to social media on a weekly basis.  Please pass them along to colleagues who you know would benefit.