SAS Programming Professionals,
Did you know that you can have SAS store duplicate
observations that are removed during a sort that uses the NODUPKEY option in a
separate SAS data set?
The DUPOUT= option of PROC SORT specifies a SAS data set
that is used to store observations eliminated from a sorted data set when the
NODUPKEY option is employed. SAS sorts
the specified data set and each observation that has duplicate BY statement
variable values is written to the DUPOUT= data set as it is eliminated from the
sorted data set. Here is an example:
proc sort
nodupkey
data=sashelp.class
out=sorted_class
dupout=dupkeys
;
by
sex age;
run;
The SAS log for this example looks like this:
1
proc sort nodupkey data=sashelp.class out=sorted_class dupout=dupkeys;
2
by sex age;
3
run;
NOTE: There were 19 observations read
from the data set SASHELP.CLASS.
NOTE: SAS sort was used.
NOTE: 8 observations with duplicate key values
were deleted.
NOTE: The data set WORK.SORTED_CLASS has
11 observations and 5 variables.
NOTE: The data
set WORK.DUPKEYS has 8 observations and 5 variables.
NOTE: PROCEDURE SORT used (Total process
time):
real time 0.01 seconds
cpu time 0.01 seconds
Note that 8 observations with duplicate values of SEX and
AGE were written to the DUPKEYS SAS data set.
This can come in very handy for QC-ing programs where you
are eliminating duplicate-keyed observations during a sort. You can double-check the observations in the
DUPOUT= SAS data set and verify that the sort really got rid of observations
that are not needed in the original sorted data set.
Best
of luck in all your SAS endeavors!
----MMMMIIIIKKKKEEEE
(aka
Michael A. Raithel)
Excerpt from the book:
Did You Know That? Essential Hacks for Clever SAS Programmers
I plan to post each and every one of the hacks in the book to social media on a weekly basis. Please pass them along to colleagues who you know would benefit.