Tuesday, July 26, 2016

Hack 3.15 Keeping Specific SAS Files During Mass Deletions

SAS Programming Professionals,

Did you know that you can delete all SAS files in a given SAS data library _EXCEPT_ those that you specifically list in the SAVE statement using PROC DATASETS? 

Think of PROC DATASETS’ SAVE statement as a kind of KEEP statement for SAS files instead of for SAS variables.  Consider this example:

proc datasets library=raithlib;
      save FaveCDList FaveMovielist;
run;
quit;

That program deletes the dozen or so SAS data sets, catalogs, etc. in my RAITHLIB SAS data library, leaving my two favorite SAS data sets FaveCDList and FaveMovieList behind.

If I had a catalog with the same name as one of the two aforementioned SAS data sets, I would simply include a “/memtype=data” to the end of the SAVE statement, above, and that catalog would be history, but my SAS data sets wouldn’t be!

Best of luck in all your SAS endeavors!

----MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)

Excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers

I plan to post each and every one of the hacks in the book to social media on a weekly basis.  Please pass them along to colleagues who you know would benefit.

Monday, July 18, 2016

Hack 3.14 Jump Starting PROC REPORT

SAS Programming Professionals,

Did you know that you can get PROC REPORT to “jump start” itself?

The breadth and scope of the SAS programming language makes it a challenge to remember the exact syntax of every procedure.  So, many of us use shortcuts whenever we can find them.  One such shortcut is the LIST option in PROC REPORT.  The LIST option directs SAS to write the PROC REPORT code to the log… without line numbers! 

So, this SAS code:

proc report data=sashelp.class list noexec;
run;

…produces this log entry:

1    proc report data=sashelp.class list noexec;
2    run;

PROC REPORT DATA=SASHELP.CLASS LS=96  PS=54  SPLIT="/" CENTER ;
COLUMN  Name Sex Age Height Weight;

DEFINE  Name / DISPLAY FORMAT= $8. WIDTH=8     SPACING=2   LEFT "Name" ;
DEFINE  Sex / DISPLAY FORMAT= $1. WIDTH=1     SPACING=2   LEFT "Sex" ;
DEFINE  Age / SUM FORMAT= BEST9. WIDTH=9     SPACING=2   RIGHT "Age" ;
DEFINE  Height / SUM FORMAT= BEST9. WIDTH=9     SPACING=2   RIGHT "Height" ;
DEFINE  Weight / SUM FORMAT= BEST9. WIDTH=9     SPACING=2   RIGHT "Weight" ;
RUN;

We can now cut-n-paste the PROC REPORT code from the log into a SAS program and edit it to make a perfect report for our project.  Not bad, eh?

Wondering about the NOEXEC option?  Not surprisingly, that option tells SAS NOT to execute the REPORT procedure and create a report.  Might as well save a few CPU cycles whenever we can! 

Best of luck in all your SAS endeavors!

----MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)

Excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers

I plan to post each and every one of the hacks in the book to social media on a weekly basis.  Please pass them along to colleagues who you know would benefit.

Friday, July 15, 2016

Hack 3.13 Inserting Blank Lines in PROC PRINT Output

SAS Programming Professionals,

Did you know that you can make PRINT procedure output more readable by inserting blank lines?

If you are still tethered to using good old PROC PRINT to create reports for your users, then you know how crowded some of the listings can be when you have a lot of data to print.  The BLANKLINE option of PROC PRINT allows you to insert a blank line every N rows.  Here is an example:

proc sort data=sashelp.class out=class;
      by sex;
run;

proc print noobs data=class sumlabel blankline=5;
     
by sex;

sum weight height;

label sex = "Gender";

title1 "Class Roster By Gender";

run;

The BLANKLINE option on our PROC PRINT statement specifies for SAS to insert a blank line after every five lines of output.  The result looks, in part, like this:

Gender=F
Name
Age
Height
Weight
Alice
13
56.5
84.0
Barbara
13
65.3
98.0
Carol
14
62.8
102.5
Jane
12
59.8
84.5
Janet
15
62.5
112.5

Joyce
11
51.3
50.5
Judy
14
64.3
90.0
Louise
12
56.3
77.0
Mary
15
66.5
112.0
Gender

545.3
811.0

Notice the nice blank line between Janet and Joyce.  I like to set the value to either 10 or 20 on those rare occasions when I am creating long listings with PROC PRINT.  I wonder what value you will end up using.

Best of luck in all your SAS endeavors!

----MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)

Excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers

I plan to post each and every one of the hacks in the book to social media on a weekly basis.  Please pass them along to colleagues who you know would benefit.

Monday, July 11, 2016

Hack 3.12 Identifying Duplicate Variable Values with PROC MEANS

SAS Programming Professionals,

Did you know that you can use the MEANS procedure to identify duplicate variable values in a SAS data set?

This can come in very handy when you need to QC a SAS data set where you know that a particular variable is supposed to have unique values, or where the combined values of a set of variables is supposed to be unique.

Here is an example:

%MACRO IDENTIFY_DUPES(LIBREF=, DSNAME=, VARLIST=);

proc means data=&LIBREF..&DSNAME nway noprint missing;
      class &VARLIST;
      output out=duplicates(drop=_type_ rename=(_freq_ = 
                 duplicate_count)
                  where=(duplicate_count > 1) ) sum=;
run;

proc print data=duplicates noobs label;
      var &VARLIST duplicate_count;

title1 "Duplicate Values in the &DSNAME SAS Data Set";

title2 "Duplicate Count for Variables: &VARLIST";

label duplicate_count = "Duplicate Count";

run;

%MEND IDENTIFY_DUPES;

%IDENTIFY_DUPES(LIBREF=SASHELP, DSNAME=prdsal2,
    VARLIST=country county prodtype year);

This macro accepts three parameters:

·        LIBREF – The libref of the SAS data library that contains our target data set
·        DSNAME – The name of our target SAS data set
·        VARLIST – The list of variables whose combination we want to check for duplicate values.

The MEANS procedure specifies our target SAS data set by using the LIBREF and DSNAME macro variables.  The NWAY option states that we only want statistics for the unique combinations of CLASS variables.  NOPRINT suppresses that pesky, unneeded list output.  The MISSING option directs PROC MEANS to consider missing values when computing statistics.  We use the list of variables in the &VARLIST macro in the CLASS statement, so the procedure only calculates statistics for those specific variables.

In the OUTPUT statement, we rename PROC MEAN’s _FREQ_ variable to DUPLICATE_COUNT, and only keep the summarized observations whose DUPLICATE_COUNT is greater than 1.  The SUM= option directs PROC MEANS to summarize the number of observations it finds for each combination of variables and store that number in the _FREQ_ variable.

The PRINT procedure creates a report of duplicate variable values surfaced by the MEANS procedure.

Here is part of the resulting list output from the PROC PRINT:



Duplicate Values in the prdsal2 SAS Data Set
Duplicate Count for Variables: country county prodtype year
Country
County
Product Type
Year
Duplicate
Count

Canada

FURNITURE
1995
576

Canada

FURNITURE
1996
576

Canada

FURNITURE
1997
576

Canada

FURNITURE
1998
576

Canada

OFFICE
1995
576

Canada

OFFICE
1996
576

Canada

OFFICE
1997
576

Canada

OFFICE
1998
576



If you are concerned about duplicate variable values in your SAS data sets, this is the macro for you!

Best of luck in all your SAS endeavors!
----MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)

Excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers

I plan to post each and every one of the hacks in the book to social media on a weekly basis.  Please pass them along to colleagues who you know would benefit.