SAS
Programming Professionals,
Did you know
that you can sort very large SAS data sets with the judicious use of SAS views?
Once in a
while, we all get a SAS data set so large that our computer does not have
enough space to sort it. One way around
this problem is to use SAS views to break the original data set into segments
that can be sorted separately and then reassembled into a final sorted data set. An overview of this methodology is:
- Create several views which subset the data set into segments aligned with a subgroup values of the sort variables.
- Input each of the views into its own PROC SORT and write the output to individual data sets.
- Use the APPEND procedure to append the sorted data sets into one large sorted data set.
- Use the DATASETS procedure to assert the sort for the final SAS data set.Here is an example which uses the SASHELP.CLASS data set. Yes, I do realize SASHELP.CLASS only has 19 observations, but it serves our overall purposes for illustrating the methodology.
/*Create a data set that is not sorted by first
name */
proc sort
data=sashelp.class out=work.class;
by
weight;
run;
/* Create Views based on first letter of first
name */
data class1 / view=class1;
set work.class;
where "A"
<= substr(name,1,1)
<= "G";
run;
data class2 / view=class2;
set work.class;
where "H"
<= substr(name,1,1)
<= "R";
run;
data class3 / view=class3;
set work.class;
where "S"
<= substr(name,1,1)
<= "Z";
run;
/* Sort views out to SAS data sets*/
proc sort
data=class1 out=class1_sorted;
by name;
run;
proc sort
data=class2 out=class2_sorted;
by name;
run;
proc sort
data=class3 out=class3_sorted;
by name;
run;
/* Apppend sorted data sets and clean up after
yourself*/
proc append
base=class_sorted_by_name data=class1_sorted;
run;
proc append
base=class_sorted_by_name data=class2_sorted;
run;
proc append
base =class_sorted_by_name data=class3_sorted;
run;
proc datasets
library=work nolist;
modify class_sorted_by_name(sortedby=
name);
run;
delete class1_sorted
class2_sorted class3_sorted;
delete class1(memtype=view)
class2(memtype=view)
class3(memtype=view);
quit;
The first
thing we do is create a data set to use for this example. We sort the CLASS data set by age to make
sure it is not sorted by NAME. Now, we
are ready to go.
The first
real step in this methodology is to create three DATA step views. Each one of the views constructs a subset of
the original SAS data set by using a
logical sequence of the key variable value we will be using to sort the data
set. (The number of observations
returned by all three DATA step views will be equal to the number of
observations in the original data set). Once
the views are created, each one is input to a sort. The SORT procedure inputs the view; which in
turn only feeds the related portion of the original data set into the
sort. That portion of the original data
set is sorted by NAME and output to its own sorted data set.
We continue
by using PROC APPEND to concatenate the sorted segments of our SAS data set
together into a new SAS data set.
Finally, we use the DATASETS procedure to specify that our final SAS
data set is sorted by NAME. And, we are
careful to clean up those messy, large, intermediate SAS data sets by deleting
them. For good measure, we delete the
views too.
This is
obviously a simplistic example. But, a
clever person such as yourself can doubtlessly extend it to sort the impossibly
large real-world SAS data sets that drift into your own work-life!
Best of luck
in all of your SAS endeavors!
----MMMMIIIIKKKKEEEE
(aka Michael
A. Raithel)
Excerpt from the book:
Did You Know That? Essential Hacks for Clever SAS Programmers
I plan to post each and every one of the hacks in the book to social media on a weekly basis. Please pass them along to colleagues who you know would benefit.
No comments:
Post a Comment