Monday, April 24, 2017

The Little SAS Program's BIG Night Out















The little SAS program’s official name was Extract_Transform_Load_ 0314.sas.  But, that name was much too formal, way too long, and did not roll off of the tongue very easily at all.  So, everybody simply called her:  ETL Pi.

ETL Pi was conceived in a 2-hour project strategy meeting in conference room 427C on a Monday afternoon in early May.  She was born three days later in Susan G.’s cubicle in the west wing of the sixth floor.  Susan was a mid-level programmer/analyst with seven years of SAS programming experience.  She first crafted ETL Pi to perform only the basic functions of extracting data from the Oracle database and loading it into an Excel spreadsheet for use by the analysts.  Susan wanted to make sure that she had the basic data transportation mechanisms—the Extract and Load parts of the ETL process—correctly programmed before moving on.  So, ETL Pi began her existence as a compact 18-line SAS program that performed the bare minimum functions on test data.

After Susan was confident that she had the Extract and Load mechanisms correct, she added the lines of code that did the Transform part of the ETL process by creating constructed variables from those extracted from the Oracle database.  She sent the resulting Excel files to the analysts for review and approval.  The analysts had a lot of comments, ideas, proposals, recommendations, suggestions, and counter-suggestions; so Susan modified, changed, revised, amended, adjusted, reformed and altered ETL Pi accordingly until the analysts were finally satisfied.  This resulted in ETL Pi growing to modest 57 lines of lean, efficient SAS code.

Susan productionized ETL Pi by first creating a directory for the Excel file that would be created each night and granting permissions to that directory to the analysts.  Then, she modified ETL Pi so that it would be run on the UNIX server where the production Oracle database was housed.  Susan did this by adding five SAS/Connect statements to ETL Pi specifying how to connect to the UNIX server and specifying which portions of the SAS code would be sent to execute on the server. The additions of the OPTIONS, SIGNON, RSUBMIT, ENDRSUBMIT, and SIGNOFF statements brought ETL Pi to 62 lines of SAS code.

Finally, Susan scheduled ETL Pi to run at 10:00pm every night from her PC workstation.  To do this, she first created a .bat file that held the command to execute SAS and which also pointed at the full path to the ETL Pi SAS program.  Next, she invoked the Windows Task Scheduler and went through the several manual point-and-click steps to schedule ETL Pi to run at 10:00pm every night.  Satisfied with her preparations, Susan went home that evening anticipating a good first run and a clean SAS log.

ETL Pi was nervous as the clock slowly ticked away towards 10:00pm.  She was used to working with small amounts of data and jostling for network bandwidth during the day.  But, she didn’t know quite what to expect when running on the UNIX server where the big, important corporate programs ran.  Tonight promised to be quite an adventure.

At 10:00pm sharp, the Windows Task Scheduler executed the .bat file.  That invoked SAS in batch mode and started-up ETL Pi.  Once initiated, ETL Pi executed an OPTIONS statement that contained two SAS/Connect-oriented options.  COMMAMID=TCP specified for SAS to use the TCP communications access method, and REMOTE=UNIX427 specified that ETL Pi would be working on the UNIX427 server.  Next, ETL Pi executed a SIGNON statement that pointed to a SAS/Connect script that actually initiated the connection between Susan’s PC and the UNIX server.  The script executed and instantiated a SAS session on the UNIX server that ETL Pi could directly communicate with.

The next block of ETL Pi’s SAS code was sandwiched between RSUBMIT and ENDRSUBMIT SAS/Connect statements.  So, that block of code would be submitted directly to the UNIX server and be processed there.

Oh, goodness; here I go! ETL Pi thought.  Then,

            Wheeeeeeeeeee!

...she exclaimed as she was whizzed at dizzying speeds across the network, through half a mile of fiber-optic cable, and into the SAS session executing on the UNIX427 server sitting in the data center.

ETL Pi emerged into a very crowded UNIX server.  There were system backups, database updates, database backups, five java programs, and dozens of other SAS programs all jostling for computer resources.  Not intimidated, ETL Pi bravely elbowed her way into the mix of programs, trying to get as much memory, disk space, and processor cores as she possibly could.  I have a job to do, she thought.  And I am going to do my best to get it done.

ETL Pi executed a LIBNAME statement that had the proper connection string information to access a production Oracle database using SAS/Access Interface to Oracle.  Next she executed the SQLGENERATION=DBMS option in an OPTIONS statement.  That option allows SAS to use in-database procedures which executes select SAS procedures within the Oracle database itself in order to take advantage of the database’s native processing speed and facilities.

Next, ETL Pi submitted a PROC SUMMARY to summarize specific data in one of the large tables in the Oracle database.  Because of the SQLGENERATION option, that summary was efficiently done within the Oracle database and the small result set was returned to the SAS work library within ETL Pi’s UNIX SAS session.

That was all that ETL Pi needed to do on the UNIX server.  She finished up by using PROC DOWNLOAD to download the summarized SAS data set in the work library on UNIX to the SAS work library on the PC.

Okay, I’m going home now, she thought as the ENDRSUBMIT statement executed.

                     Wheeeeeeeeeee!

...she shouted in exhilaration as control was passed back to the SAS session running back on the PC.

Once she was back home, ETL Pi executed two ODS statements, a DATA step, and the Report procedure that created the Excel spreadsheet for Susan’s analyst clients.  Then, with a heavy sigh, she executed the SIGNOFF statement which terminated the UNIX SAS session and closed the connection with the server.

Just before ending, ETL Pi took a quick look at the SAS log.  Not a WARNING or ERROR anywhere in the log, she beamed proudly.  Susan is going to be very happy when she reviews it in the morning.  Now that was fun, and I can’t wait to do it again tomorrow night!

And with that, the little SAS program's big night out came to an end.


---MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)

Monday, April 17, 2017

Hack 4.19 Up-Coding Text Variables with the VVALUE Function


SAS Programming Professionals,

Did you know that you can use a SAS format to up-code text variables based on a numeric response variable?

The VVALUE function returns a character value that is the results of bouncing a numeric variable’s value against a format.  The format for the VVALUE function is:

VVALUE (variable)

Let’s look at an example:

/* Create potato chip format */
proc format;
      value chiptype

      1 = "Ruffles"
      2 = "Lay's"
      3 = "Kettle"
      4 = "Miss Vickies"
      5 = "Pringles"
      other = "Don't Know"
      ;
run;


/* Create example data set */
data class;
set  sashelp.class;

Chip_Response = mod(age, 10);

format Chip_Response chiptype.;

run;


/* Upcode value of Chip_Response_Text */
data class_Response;
set  class;

length Chip_Response_Text $12.;

Chip_Response_Text = vvalue(Chip_Response);

run;

proc print noobs data=class_Response;
title1 "Results of the Potato Chip Question";
var Name Age Chip_Response Chip_Response_Text;
format Chip_Response 8.;
run;

First we create the CHIPTYPE format to associate numeric responses with the related potato chip type as per the questionnaire.  The first DATA step creates the Chip_Response variable that represents the various subject’s response to the potato chip question in the survey.

The second DATA step uses the VVALUE function to assign the correct response text value to Chip_Response_Text, based on the value found in variable Chip_Response.  Since the argument to VVALUE is Chip_Response, the function looks to the CHIPTYPE format for the values that should be returned.  It does so because CHIPTYPE is the format associated with variable Chip_Response.  Chip_Response values of 1 result in Chip_Response_Text values of “Ruffles”, values of 2 result in “Lay’s”, and so forth; as per the CHIPTYPE format.

The output of the PROC PRINT looks like this:

Results of the Potato Chip Question

       Chip_
   Chip_      Response_
Name       Age    Response    Text
Alfred      14           4    Miss Vickies
Alice       13           3    Kettle
Barbara     13           3    Kettle
Carol       14           4    Miss Vickies
Henry       14           4    Miss Vickies
James       12           2    Lay's
Jane        12           2    Lay's
Janet       15           5    Pringles
Jeffrey     13           3    Kettle
John        12           2    Lay's
Joyce       11           1    Ruffles
Judy        14           4    Miss Vickies
Louise      12           2    Lay's
Mary        15           5    Pringles
Philip      16           6    Don't Know
Robert      12           2    Lay's
Ronald      15           5    Pringles
Thomas      11           1    Ruffles
William     15           5    Pringles

Note that if you do not specify a length for a variable created with the VVALUE function, the variable length will default to 200 characters.  Ouch!

Best of luck in all your SAS endeavors!

---MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)
Author of the new cult classic for computer programmers:  It Only Hurts When I Hit <ENTER>
Print edition:  http://tinyurl.com/z8bzx2e 
Kindle edition: http://tinyurl.com/zypgqa7 

The hack above is an excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers

Monday, April 10, 2017

Hack 4.18 Up-Coding Text Variables with the CHOOSEC Function


SAS Programming Professionals,

Did you know that you can use a SAS function to up-code text variables based on a numeric response variable?

The CHOOSEC function returns a character value that represents the results of choosing from a list of arguments based on the value of an index variable.  The format for the CHOOSEC function is:

CHOOSEC (index-expression, selection-1 <,...selection-n>)

Let’s look at an example:

/* Create example data set */
data class;
set  sashelp.class;

Chip_Response = mod(age, 10);

run;

/* Upcode value of Chip_Response_Text */
data class_Response;
set  class;

length Chip_Response_Text $12.;

Chip_Response_Text = choosec(Chip_Response,"Ruffles", "Lay's", "Kettle", "Miss Vickies", "Pringles", "Don't Know");

run;

The first DATA step simply produces the data set we will be using for the example.  In that DATA step, we create a variable Chip_Response that represents the various subject’s response to the potato chip question in the survey.

The second DATA step uses the CHOOSEC function to assign the correct response text value to Chip_Response_Text, based on the value found in variable Chip_Response.  Chip_Response values of 1 result in Chip_Response_Text values of “Ruffles”, values of 2 result in “Lay’s”, and so forth.  Consequently, Chip_Response_Text ends up containing the correct text value (“Ruffles”, “Lays”, “Kettle”, etc.) as coded in the CHOOSEC function in the DATA step

What’s that?  You are curious as to why I specified a length for Chip_Response_Text instead of letting the length of that variable simply default?  Good question!  I did that because the CHOOSEC function has a nasty habit of assigning variables a length of 200 if the variable is created within the function.  So, keep that caveat in mind when you use the CHOOSEC function in your own programs.

Best of luck in all your SAS endeavors!

---MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)
Author of the new cult classic for computer programmers:  It Only Hurts When I Hit <ENTER>
Print edition:  http://tinyurl.com/z8bzx2e 
Kindle edition: http://tinyurl.com/zypgqa7 

The hack above is an excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers

Monday, March 27, 2017

Hack 4.17 Sorting Variables Within an Observation


SAS Programming Professionals,

Did you know that you can sort the values of variables within an observation into ascending order? 

The SORTN CALL routine allows you to do just that.  Here is an example:

options nodate nonumber;

/* Create the Test data set*/
proc sort in=sashelp.mdv out=work.mdv(keep=origcity shipdate);
      by origcity;
run;

proc transpose data=mdv out=mdvtrans prefix=date;
      by origcity;
      var shipdate;
run;

/* Print the obs with unsorted dates */
proc print data=mdvtrans;
title1 "Dates Not Sorted";
run;

/* Use the SORTN call routine to sort dates */
data sorted_MDV_dates;
set  mdvtrans;

array sortdate{28} date1 - date28;

call sortn (of sortdate(*));

run;

/* Print the obs with sorted dates */
proc print data=sorted_mdv_dates;
title1 "Dates Sorted";
run;

The first part of the program sets up the data set that we will use in the DATA Step that is highlighted in red.  The first steps create a SAS data set with 28 date variable in it.  That data set is output in the first PROC PRINT.  In the DATA Step we use the SORTN to sort the dates (DATE1 – DATE28) into ascending order.  The second PROC PRINT reveals that DATE1 – DATE28 have been sorted into ascending date sequence.  One statement; it’s that simple!

CALL SORTN can be useful in cases where you need to order a series of event dates in an observation for a particular subject id.  Note that its twin function, CALL SORTC, can be used to sort character variables into ascending order within an observation.

Best of luck in all your SAS endeavors!

---MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)
Author of the new cult classic for computer programmers:  It Only Hurts When I Hit <ENTER>
Print edition:  http://tinyurl.com/z8bzx2e 
Kindle edition: http://tinyurl.com/zypgqa7 

The hack above is an excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers

Monday, March 13, 2017

Hack 4.16 Simplifying IF/THEN/ELSE Statements Using the IFC and IFN Functions


SAS Programming Professionals,

Did you know that the IFC and IFN functions allow you to have 1-line IF/THEN/ELSE-like statements? 

The IFC and IFN functions can be handy when you have a single THEN condition and a single ELSE condition and you would like to have an action taken if the test condition results in missing values.

The IFC function returns a character value based on whether or not an expression is true, false, or missing.  The basic form of the function is:

               IFC(logical-expression, value-returned-when-true, value-returned-when-false <,value-returned-when-missing>)

Here is an example:

data class_gender;
set  sashelp.class;

length gender $7;
label gender = "Gender";

gender = ifc(sex="M", "Male  ", "Female", "Unknown");

run;

In the example, we test variable SEX for the value “M”.  If SEX = “M”, the variable GENDER is set to “Male”.  If not, it is set to “Female”.  If SEX contains a missing value, it is set to “Unknown”.

The IFN function returns a numeric value based on whether or not an expression is true, false, or missing.  The basic form of the function is:

               IFN(logical-expression, value-returned-when-true, value-returned-when-false <,value-returned-when-missing>)

Here is an example:

data electric_surcharge;
set  sashelp.electric;

label surcharge = "Surcharge (B)";

surcharge = ifn(year=2005, revenue *.001, revenue * .000025, 0);

run;

In the example, we test the variable YEAR for the numeric value of 2005.  If YEAR = 2005, the variable SURCHARGE is set to REVENUE * .001.  If YEAR does not equal 2005, SURCHARGE is computed to be REVENUE * .000025.  If YEAR contains missing values, SURCHARGE is set to Zero.

You can probably think of dozens of uses for the IFC and IFN functions.  I like to think of these two functions as “IF/THEN/ELSE in a can”.

Best of luck in all your SAS endeavors!

---MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)
Author of the new cult classic for computer programmers:  It Only Hurts When I Hit <ENTER>
Print edition:  http://tinyurl.com/z8bzx2e 
Kindle edition: http://tinyurl.com/zypgqa7 

The hack above is an excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers

Monday, March 6, 2017

Hack 4.15 Setting Multiple Variables to Missing in a Single Statement


SAS Programming Professionals,

Did you know that you can easily set one or more SAS variables to missing values in a single statement?

The CALL Missing SAS call routine allows you to set character and numeric variables to missing values.  Call Missing sets numeric variables to numeric missing (.), and character variables to character missing (blank).  The basic format is:

CALL MISSING(varname1<, varname2, ....>);

Here is an example:

data vetted_class;
set  sashelp.class;

if age < 14 then call missing(weight, height, sex);

run;

In the example, we have specific criteria for setting variables to missing—when the subject is younger than 14 years old.  When that is the case, we set the values of weight, height, and sex to missing.  Note that weight and height are numeric variables, while sex is a character variable.  It makes no difference at all to CALL MISSING; it sets them all to missing!

Best of luck in all your SAS endeavors!

---MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)
Author of the new cult classic for computer programmers:  It Only Hurts When I Hit <ENTER>
Print edition:  http://tinyurl.com/z8bzx2e 
Kindle edition: http://tinyurl.com/zypgqa7 

The hack above is an excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers

Monday, February 27, 2017

Hack 4.14 Running System Commands from a SAS Program


SAS Programming Professionals,

Did you know that you can execute Windows operating system commands from within a SAS Data step?

You can do this by using the CALL SYSTEM call routine.  Unlike its blunt cousin the X statement, CALL SYSTEM can be executed conditionally within a DATA step.  This makes it much more flexible and useful.  Here is an example:

/* Rename directory on Mondays*/
data _null_;

if weekday(today()) = 2 then do;

      call system("rename c:\temp\mikedir mikedir_old");
      call system("md c:\temp\mikedir");

end;

run;

ods rtf file="c:\temp\mikedir\classlist.rtf";

proc print noobs data=sashelp.class;
title1 "List of Subjects in Weight/Height Study";
run;

ods rtf close;

In the example, the DATA _NULL_ step houses the CALL SYSTEM call routines.  They are only executed when the current day of the week is Monday.  When they are executed, the first CALL SYSTEM sends a command to Windows to rename the MIKEDIR directory to MIKEDIR_OLD.  The second CALL SYSTEM sends a command to Windows to create a new directory named MIKEDIR.  The subsequent PROC PRINT creates an RTF file in the new MIKEDIR directory.

Obviously, the Windows system commands must be syntactically correct for the CALL SYSTEM call routine to work properly.  Here is one of the many online sources for looking up Windows system commands:


The previously mentioned X command executes unconditionally.  That means that if we were to replace both CALL SYSTEM’s with X commands in the program above, they would executed no matter what day of the week it is.  They just barge right in.  That is why I prefer the CALL SYSTEM call routine for shelling out operating system commands from within my SAS programs.

Best of luck in all your SAS endeavors!

---MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)
Author of the new cult classic for computer programmers:  It Only Hurts When I Hit <ENTER>
Print edition:  http://tinyurl.com/z8bzx2e 
Kindle edition: http://tinyurl.com/zypgqa7 

The hack above is an excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers