Monday, April 24, 2017

The Little SAS Program's BIG Night Out















The little SAS program’s official name was Extract_Transform_Load_ 0314.sas.  But, that name was much too formal, way too long, and did not roll off of the tongue very easily at all.  So, everybody simply called her:  ETL Pi.

ETL Pi was conceived in a 2-hour project strategy meeting in conference room 427C on a Monday afternoon in early May.  She was born three days later in Susan G.’s cubicle in the west wing of the sixth floor.  Susan was a mid-level programmer/analyst with seven years of SAS programming experience.  She first crafted ETL Pi to perform only the basic functions of extracting data from the Oracle database and loading it into an Excel spreadsheet for use by the analysts.  Susan wanted to make sure that she had the basic data transportation mechanisms—the Extract and Load parts of the ETL process—correctly programmed before moving on.  So, ETL Pi began her existence as a compact 18-line SAS program that performed the bare minimum functions on test data.

After Susan was confident that she had the Extract and Load mechanisms correct, she added the lines of code that did the Transform part of the ETL process by creating constructed variables from those extracted from the Oracle database.  She sent the resulting Excel files to the analysts for review and approval.  The analysts had a lot of comments, ideas, proposals, recommendations, suggestions, and counter-suggestions; so Susan modified, changed, revised, amended, adjusted, reformed and altered ETL Pi accordingly until the analysts were finally satisfied.  This resulted in ETL Pi growing to modest 57 lines of lean, efficient SAS code.

Susan productionized ETL Pi by first creating a directory for the Excel file that would be created each night and granting permissions to that directory to the analysts.  Then, she modified ETL Pi so that it would be run on the UNIX server where the production Oracle database was housed.  Susan did this by adding five SAS/Connect statements to ETL Pi specifying how to connect to the UNIX server and specifying which portions of the SAS code would be sent to execute on the server. The additions of the OPTIONS, SIGNON, RSUBMIT, ENDRSUBMIT, and SIGNOFF statements brought ETL Pi to 62 lines of SAS code.

Finally, Susan scheduled ETL Pi to run at 10:00pm every night from her PC workstation.  To do this, she first created a .bat file that held the command to execute SAS and which also pointed at the full path to the ETL Pi SAS program.  Next, she invoked the Windows Task Scheduler and went through the several manual point-and-click steps to schedule ETL Pi to run at 10:00pm every night.  Satisfied with her preparations, Susan went home that evening anticipating a good first run and a clean SAS log.

ETL Pi was nervous as the clock slowly ticked away towards 10:00pm.  She was used to working with small amounts of data and jostling for network bandwidth during the day.  But, she didn’t know quite what to expect when running on the UNIX server where the big, important corporate programs ran.  Tonight promised to be quite an adventure.

At 10:00pm sharp, the Windows Task Scheduler executed the .bat file.  That invoked SAS in batch mode and started-up ETL Pi.  Once initiated, ETL Pi executed an OPTIONS statement that contained two SAS/Connect-oriented options.  COMMAMID=TCP specified for SAS to use the TCP communications access method, and REMOTE=UNIX427 specified that ETL Pi would be working on the UNIX427 server.  Next, ETL Pi executed a SIGNON statement that pointed to a SAS/Connect script that actually initiated the connection between Susan’s PC and the UNIX server.  The script executed and instantiated a SAS session on the UNIX server that ETL Pi could directly communicate with.

The next block of ETL Pi’s SAS code was sandwiched between RSUBMIT and ENDRSUBMIT SAS/Connect statements.  So, that block of code would be submitted directly to the UNIX server and be processed there.

Oh, goodness; here I go! ETL Pi thought.  Then,

            Wheeeeeeeeeee!

...she exclaimed as she was whizzed at dizzying speeds across the network, through half a mile of fiber-optic cable, and into the SAS session executing on the UNIX427 server sitting in the data center.

ETL Pi emerged into a very crowded UNIX server.  There were system backups, database updates, database backups, five java programs, and dozens of other SAS programs all jostling for computer resources.  Not intimidated, ETL Pi bravely elbowed her way into the mix of programs, trying to get as much memory, disk space, and processor cores as she possibly could.  I have a job to do, she thought.  And I am going to do my best to get it done.

ETL Pi executed a LIBNAME statement that had the proper connection string information to access a production Oracle database using SAS/Access Interface to Oracle.  Next she executed the SQLGENERATION=DBMS option in an OPTIONS statement.  That option allows SAS to use in-database procedures which executes select SAS procedures within the Oracle database itself in order to take advantage of the database’s native processing speed and facilities.

Next, ETL Pi submitted a PROC SUMMARY to summarize specific data in one of the large tables in the Oracle database.  Because of the SQLGENERATION option, that summary was efficiently done within the Oracle database and the small result set was returned to the SAS work library within ETL Pi’s UNIX SAS session.

That was all that ETL Pi needed to do on the UNIX server.  She finished up by using PROC DOWNLOAD to download the summarized SAS data set in the work library on UNIX to the SAS work library on the PC.

Okay, I’m going home now, she thought as the ENDRSUBMIT statement executed.

                     Wheeeeeeeeeee!

...she shouted in exhilaration as control was passed back to the SAS session running back on the PC.

Once she was back home, ETL Pi executed two ODS statements, a DATA step, and the Report procedure that created the Excel spreadsheet for Susan’s analyst clients.  Then, with a heavy sigh, she executed the SIGNOFF statement which terminated the UNIX SAS session and closed the connection with the server.

Just before ending, ETL Pi took a quick look at the SAS log.  Not a WARNING or ERROR anywhere in the log, she beamed proudly.  Susan is going to be very happy when she reviews it in the morning.  Now that was fun, and I can’t wait to do it again tomorrow night!

And with that, the little SAS program's big night out came to an end.


---MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)

Monday, April 17, 2017

Hack 4.19 Up-Coding Text Variables with the VVALUE Function


SAS Programming Professionals,

Did you know that you can use a SAS format to up-code text variables based on a numeric response variable?

The VVALUE function returns a character value that is the results of bouncing a numeric variable’s value against a format.  The format for the VVALUE function is:

VVALUE (variable)

Let’s look at an example:

/* Create potato chip format */
proc format;
      value chiptype

      1 = "Ruffles"
      2 = "Lay's"
      3 = "Kettle"
      4 = "Miss Vickies"
      5 = "Pringles"
      other = "Don't Know"
      ;
run;


/* Create example data set */
data class;
set  sashelp.class;

Chip_Response = mod(age, 10);

format Chip_Response chiptype.;

run;


/* Upcode value of Chip_Response_Text */
data class_Response;
set  class;

length Chip_Response_Text $12.;

Chip_Response_Text = vvalue(Chip_Response);

run;

proc print noobs data=class_Response;
title1 "Results of the Potato Chip Question";
var Name Age Chip_Response Chip_Response_Text;
format Chip_Response 8.;
run;

First we create the CHIPTYPE format to associate numeric responses with the related potato chip type as per the questionnaire.  The first DATA step creates the Chip_Response variable that represents the various subject’s response to the potato chip question in the survey.

The second DATA step uses the VVALUE function to assign the correct response text value to Chip_Response_Text, based on the value found in variable Chip_Response.  Since the argument to VVALUE is Chip_Response, the function looks to the CHIPTYPE format for the values that should be returned.  It does so because CHIPTYPE is the format associated with variable Chip_Response.  Chip_Response values of 1 result in Chip_Response_Text values of “Ruffles”, values of 2 result in “Lay’s”, and so forth; as per the CHIPTYPE format.

The output of the PROC PRINT looks like this:

Results of the Potato Chip Question

       Chip_
   Chip_      Response_
Name       Age    Response    Text
Alfred      14           4    Miss Vickies
Alice       13           3    Kettle
Barbara     13           3    Kettle
Carol       14           4    Miss Vickies
Henry       14           4    Miss Vickies
James       12           2    Lay's
Jane        12           2    Lay's
Janet       15           5    Pringles
Jeffrey     13           3    Kettle
John        12           2    Lay's
Joyce       11           1    Ruffles
Judy        14           4    Miss Vickies
Louise      12           2    Lay's
Mary        15           5    Pringles
Philip      16           6    Don't Know
Robert      12           2    Lay's
Ronald      15           5    Pringles
Thomas      11           1    Ruffles
William     15           5    Pringles

Note that if you do not specify a length for a variable created with the VVALUE function, the variable length will default to 200 characters.  Ouch!

Best of luck in all your SAS endeavors!

---MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)
Author of the new cult classic for computer programmers:  It Only Hurts When I Hit <ENTER>
Print edition:  http://tinyurl.com/z8bzx2e 
Kindle edition: http://tinyurl.com/zypgqa7 

The hack above is an excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers

Monday, April 10, 2017

Hack 4.18 Up-Coding Text Variables with the CHOOSEC Function


SAS Programming Professionals,

Did you know that you can use a SAS function to up-code text variables based on a numeric response variable?

The CHOOSEC function returns a character value that represents the results of choosing from a list of arguments based on the value of an index variable.  The format for the CHOOSEC function is:

CHOOSEC (index-expression, selection-1 <,...selection-n>)

Let’s look at an example:

/* Create example data set */
data class;
set  sashelp.class;

Chip_Response = mod(age, 10);

run;

/* Upcode value of Chip_Response_Text */
data class_Response;
set  class;

length Chip_Response_Text $12.;

Chip_Response_Text = choosec(Chip_Response,"Ruffles", "Lay's", "Kettle", "Miss Vickies", "Pringles", "Don't Know");

run;

The first DATA step simply produces the data set we will be using for the example.  In that DATA step, we create a variable Chip_Response that represents the various subject’s response to the potato chip question in the survey.

The second DATA step uses the CHOOSEC function to assign the correct response text value to Chip_Response_Text, based on the value found in variable Chip_Response.  Chip_Response values of 1 result in Chip_Response_Text values of “Ruffles”, values of 2 result in “Lay’s”, and so forth.  Consequently, Chip_Response_Text ends up containing the correct text value (“Ruffles”, “Lays”, “Kettle”, etc.) as coded in the CHOOSEC function in the DATA step

What’s that?  You are curious as to why I specified a length for Chip_Response_Text instead of letting the length of that variable simply default?  Good question!  I did that because the CHOOSEC function has a nasty habit of assigning variables a length of 200 if the variable is created within the function.  So, keep that caveat in mind when you use the CHOOSEC function in your own programs.

Best of luck in all your SAS endeavors!

---MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)
Author of the new cult classic for computer programmers:  It Only Hurts When I Hit <ENTER>
Print edition:  http://tinyurl.com/z8bzx2e 
Kindle edition: http://tinyurl.com/zypgqa7 

The hack above is an excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers