Monday, December 19, 2016

Hack 4.5 Determining How Large an Integer You Can Store in a Numeric Variable

SAS Programming Professionals,

Did you know that the CONSTANT function can help you to determine how large an integer you can store in a numeric variable with a given number of bytes? 

This can be helpful when you are working at slimming-down SAS data sets by reducing the number of bytes for numeric variables.  If you know—and know for sure—the largest value that will be stored in that variable, and that you will only be using it to store integers, then you can set its byte-size accordingly.

Here is how you can use the CONSTANT function with the EXACTINT constant:

data _null_;

do i = 2 to 8 by 1;

exactint = constant('EXACTINT',i);

put i exactint;

end;

run;

The PUT statement produces the following:

2 32
3 8192
4 2097152
5 536870912
6 137438953472
7 3.5184372E13
8 9.0071993E15

So, a numeric variable created with a length of 4 will allow me to store integers up to a value of 2,097,152. Note that these results may vary depending on the platform you are using to run SAS.

Two other CONSTANT function constants that are worth exploring are:  BIG and SMALL.

·        BIG = constant(‘BIG’); - Returns the largest double-precision floating-point number (8-bytes) that can be represented on your computer.

·        SMALL = constant(‘SMALL’); - Returns the smallest double-precision floating-point number (8-bytes) that can be represented on your computer.

Check out the CONSTANT function for more insights on how numbers are handled on your computing platform in the online documentation at support.sas.com.  Then, see if it makes sense to slim down some of your observations by right-sizing your numeric variables.

Best of luck in all your SAS endeavors!

 ---MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)
Author of the new cult classic for computer programmers:  It Only Hurts When I Hit <ENTER>
Print edition:  http://tinyurl.com/z8bzx2e
Kindle edition: http://tinyurl.com/zypgqa7

The hack above is an excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers
http://www.amazon.com/Michael-A.-Raithel/e/B001K8GG90/ref=ntt_dp_epwbk_0

Monday, December 12, 2016

It Only Hurts When I Hit ENTER


SAS Programming Professionals,

If you want to take a break from the serious side of programming, then check out my new book:  It Only Hurts When I Hit <ENTER>

My new book is available on Amazon.com as a print edition:  http://tinyurl.com/z8bzx2e
…or a Kindle e-book:  http://tinyurl.com/zypgqa7
Best of luck in all of your SAS endeavors!

----MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)
Amazon Author Page:
http://www.amazon.com/Michael-A.-Raithel/e/B001K8GG90/ref=ntt_dp_epwbk_0

Monday, December 5, 2016

Hack 4.4 Creating SAS Date Variables with the MDY Function

SAS Programming Professionals,

Did you know that you can create a SAS date variable from a set of month, date, and year variables?

The MDY function creates a SAS date value from month, day, and year values.  It can be helpful when you process data where the unfortunate decision was made to collect month, day, and year in separate variables.  Here  is an example of the MDY function:

data newdate;

format birthdate worddate.;

birth_day = 27;
birth_month = 4;
birth_year = 1975;

birthdate = mdy(birth_month, birth_day, birth_year);

yearsold = year(today()) - year(birthdate);

run;

proc print noobs;
run;

In the example, we employ the MDY function to create a SAS date variable, BIRTHDATE, from separate birth_month, birth_day, and birth_year variables.  Then, we use BIRTHDATE to compute the subject’s age.

A derivative of this simple technique can be used in cases where you get the month and year of a date, but for privacy’s sake, you do not get the day.  You can hardcode a given day value—say 15 because it is the middle of most months—and create a reasonable date that you can use in calculations.  For example, consider the DATA step above without the day variable:

data newdate;

format birthdate worddate.;

birth_month = 4;
birth_year = 1975;

birthdate = mdy(birth_month, 15, birth_year);

yearsold = year(today()) - year(birthdate);

run;

In this example, we hard-coded 15 in the MDY function so that we could compute a valid date for BIRTHDATE.  We could have just as easily used a 1 or a 28 to signify the first day of the month, or a day near the end of the month.  Which is the better way to go?  It is your manager’s sage choice, of course!

Best of luck in all your SAS endeavors!

----MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)

Excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers
http://www.amazon.com/Michael-A.-Raithel/e/B001K8GG90/ref=ntt_dp_epwbk_0

Monday, November 28, 2016

Hack 4.3 Computing Smallest and Largest Value for Variables in An Observation

SAS Programing Professionals,

Did you know that you can easily compute the smallest or the largest value for a group of variables within an observation in a DATA step?

The SMALLEST and LARGEST functions return the smallest and largest value, respectively, for the group of values that you specify.  You can also have these functions return the second from smallest (or largest) value, third from smallest (or largest) value, fourth from smallest (or largest) value, and so on.  The form of these functions is:

SMALLEST(k, value-1<,value-2,…>)

LARGEST(k, value-1<,value-2,…>)

…where k = is a numeric constant, variable, or expression that specifies which value to return.  And, value-1 (value-2, etc.) specifies the value of a numeric constant, variable, or expression to be processed.

Here is an example:

options nodate nonumber;

data class(keep=oldest youngest agerange);

array ages{19} ages1 - ages19;

do i = 1 to totobs;

set sashelp.class nobs=totobs;

ages{i} = age;

end;

label youngest = "Age of Youngest Participant"
      oldest   = "Age of Oldest Participant"
      agerange = "Range of Participant Ages"
      ;

youngest = smallest(1,of ages1-ages19);
oldest   = largest(1,of ages1-ages19);
agerange = oldest - youngest;

run;

proc print noobs data=class label;
run;

In this example, we create an array (AGES) to contain the ages for all participants in our study data set.  We use the DO loop to iterate through the CLASS data set and load the ages of the participants into the array.  Then, we use the SMALLEST and the LARGEST functions to return the smallest and largest age values stored in the array.  We also calculate the range of ages.

If we had wanted the next to smallest age value, we would have coded:

 youngest = smallest(2,of ages1-ages19);

Get the picture?

If you rolled your eyes when reviewing this contrived example, think how powerful these functions are when you have a flattened patient file of admission/discharge dates.  You can quickly and easily compute the earliest or latest admission dates as well as the earliest or latest discharge dates.  These functions are in the top compartment of my own SAS tips-and-techniques toolbox!

Best of luck in all your SAS endeavors!

----MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)

Excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers
http://www.amazon.com/Michael-A.-Raithel/e/B001K8GG90/ref=ntt_dp_epwbk_0

Monday, November 21, 2016

Hack 4.2 Calculating Geodetic Distances with the ZIPCITYDISTANCE Function

SAS Programming Professionals,

Did you know that you can calculate the geodetic distance, in miles, between any two zip codes with SAS?

The aptly named ZIPCITYDISTANCE function allows you to specify both of the zip codes and then compute the distance between them.  Here is an example:

data gointoseattle;

seattle_dist = put(zipcitydistance(20850,98101),comma.);

label seattle_dist = "My Travel Distance from Rockville to Seattle";

run;

proc print noobs data=gointoseattle label;

run;

The code, above, produces the following output:

My Travel Distance
  from Rockville
  to Seattle

    2,315

Distances between Rockville and destinations in countries that do not have a US Zip code can be calculated via the GEODIST function as long as you have the longitude and the latitude of both points.  Here is an example:

data gointoseattle;

seattle_dist = put(geodist(39.085920,-77.174389,47.611330,-122.333219,"M"),comma.);

label seattle_dist = "My Travel Distance From Rockville to Seattle";

run;

proc print noobs data=gointoseattle label;

run;

This example used the GEODIST function to calculate the distance between Rockville and Seattle, and produced the same output as the previous example. 

Of course, if you are working with foreign clients, you might choose to substitute “K” (kilometers) for “M” (miles) to keep the distance in their perspective.  Miles; kilometers; what’s a few klicks between friends?

Best of luck in all your SAS endeavors!

----MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)

Excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers
http://www.amazon.com/Michael-A.-Raithel/e/B001K8GG90/ref=ntt_dp_epwbk_0

Monday, November 14, 2016

Hack 4.1 Avoiding the Hidden “Problem” in the TRANWRD Function

SAS Programming Professionals,

Did you know that:  the TRANWRD function, which can replace or remove all occurrences of a given word, has a hidden gotcha

The gotcha is that, if you do not provide a length statement for a new variable that receives the output from the TRANWRD function, that variable defaults to a length of 200 characters.

Here is an example of the gotcha:

data _null_;

length newstatement1 $34;

statement = "I enjoy going to SUGI conferences.";

newstatement1 = tranwrd(statement,"SUGI", "SGF");

newstatement2 = tranwrd(statement,"SUGI", "SGF");

length_newstatement1 = lengthc(newstatement1);

length_newstatement2 = lengthc(newstatement2);

put "length_newstatement1 = " length_newstatement1;
put "length_newstatement2 = " length_newstatement2;

run;

If you submit the code, above, you will see that newstatement1 ends up with a length of 34, while newstatement2 ends up with a length of 200.  That is because newstatement1 had a length statement specifying its size before it was used with the TRANWRD function.  Conversely, newstatement2 was created fresh and new when used in the TRANWRD function and SAS--not knowing what to make of this infant variable--decided to make it 200 characters in length; the default for the TRANWRD function.

So, what are the ramifications of this gotcha?  The ramifications are that you could end up with larger variables than you intend to--needlessly inflating the size of a SAS data set--if you are not careful to declare the length of new SAS character variables created by using the TRANWRD function.  You have been warned!

Best of luck in all your SAS endeavors!

----MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)

Excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers
http://www.amazon.com/Michael-A.-Raithel/e/B001K8GG90/ref=ntt_dp_epwbk_0

Monday, November 7, 2016

Hack 3.25 Updating SAS Data Set Metadata with PROC DATASETS

SAS Programming Professionals,

Did you know that you do not have to re-create an entire SAS data set in order to specify formats, informats, labels, or to rename variables? 

Here is an example of what a novice SAS programmer might code to add formats to price and date, add an informat to date, add labels for product and date, and to rename the variable Holiday to existing SAS data set SGFLIB.SNACKS:

data sgflib.snacks;
set  sgflib.snacks;

      format price dollar6.2
             date  worddate.
             ;
      informat date mmddyy10.;
      label    product = "Snack Name"
               date    = "Sale Date"
               ;
      rename Holiday = Holiday_Sale;

run;

This DATA step causes SAS to read every observation in SGFLIB.SNACKS and make a completely new version of it.  The only difference between the original and the new SAS data set is the change to the metadata of the four specified variables.  This is a waste of computer resources; especially if the data set being re-created is very large.

A much more efficient way to accomplish these changes is to use the DATASETS Procedure.  PROC DATASETS reads the descriptor portion of the SAS data set—stored in the first physical page on disk—makes the changes to the metadata, and then writes the page back to the disk.  So, instead of hundreds or thousands or millions of observations being read, SAS simply reads a single data set page.

This PROC DATASETS code is equivalent to the DATA step, above:

proc datasets library=sgflib;
modify snacks;
      format price dollar6.2
             date  worddate.
             ;
      informat date mmddyy10.;
      label product = "Snack Name"
            date    = "Sale Date"
            ;
      rename Holiday = Holiday_Sale;

run;
quit;

The PROC DATASETS code is clean, efficient, and much more professional!

Best of luck in all your SAS endeavors!

----MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)

Excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers

I plan to post each and every one of the hacks in the book to social media on a weekly basis.  Please pass them along to colleagues who you know would benefit.

Monday, October 24, 2016

Hack 3.24 Taking Advantage of the INHERITLIB Option with SAS/CONNECT

SAS Programming Professionals,

Did you know that when using SAS/CONNECT, you can have a program running on a server “reach back” and process data directly from a SAS data set on your client machine (workstation)? 

The aptly-named INHERITLIB option allows a remotely-running SAS program to process data from your workstation or network as if it were local to the remote server.  Using the INHERITLIB option obviates the need for you to use PROC UPLOAD to send your data to the remote server for processing.  This can be very handy when you do not want to have multiple copies of a data set spread across your various computing platforms.

The INHERITLIB option can be coded on either the RSUBMIT or the SIGNON statement.  Here is an example:

libname pclib "C:\PRODDATA";

rsubmit inheritlib=(pclib);

proc summary nway data=pclib.shoeaud(where=(region="Canada"));

      class region product subsidiary;

      var   sales  returns;

output out=sumshoes sum=;

run

endrsubmit;

run;

In this example, we are allocating the “C:\proddata” directory (on a Windows workstation) to the “PCLIB” libref in our local SAS session.  Thereafter, we RSUBMIT a simple PROC SUMMARY that will run on our Linux server.  The RSUBMIT statement specifies that the remote session is to “inherit” the PCLIB libref.  So, when the SUMMARY procedure executes on the remote Linux server, it “reaches back” to the workstation and drags the relevant observations across the network to the waiting SUMMARY procedure running on the Linux server.  Pretty neat, eh?

Of course, you need to weigh the additional network traffic associated with remote servers processing local data against any gains realized by using the INHERITLIB option.  But, you already knew that; didn’t you?

Best of luck in all your SAS endeavors!

----MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)

Excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers

I plan to post each and every one of the hacks in the book to social media on a weekly basis.  Please pass them along to colleagues who you know would benefit.

Monday, October 17, 2016

Hack 3.23 Sorting Character Variables That Have Leading Numbers in Them

SAS Programming Professionals,

Did you know that you can sort character variables with leading numbers in them into proper numeric order?

You can do this in the SORT procedure by using the SORTSEQ=LINGUISTIC option and also specifying NUMERIC_COLLATION=ON as the collating rule.  Sound like a lot of technical mumbo-jumbo to you?  Well, a simple example will doubtlessly make this clearer.

The following code:

data test;
input address $12.;
datalines;
123 Main St.
05 Main St.
45 Main St.
8 Main St.
;

proc sort data=test;
 by address;
run;

proc print data=test noobs;
run;

… produces this output:

  address

05 Main St.
123 Main St.
45 Main St.
8 Main St.

As you can see, the street addresses are not really sorted into ascending order by house number.  However, if we change the sort statement to this:

proc sort data=test sortseq=linguistic(numeric_collation=on);
by address;
run;

…we get the following:

  address

05 Main St.
8 Main St.
45 Main St.
123 Main St.

…which is much better for our interviewers to use when planning their routes down Main street USA!

Best of luck in all your SAS endeavors!

----MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)

Excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers

I plan to post each and every one of the hacks in the book to social media on a weekly basis.  Please pass them along to colleagues who you know would benefit.

Monday, October 10, 2016

Hack 3.22 Simplifying PROC DATASETS Output with the NOLIST Option

SAS Programming Professionals,

Did you know that you can stop PROC DATASETS from inflating your SAS log with those pesky long library contents listings?

By default, PROC DATASETS prints a list of the contents of the SAS library it is operating on to the SAS log.  If you have one or two SAS files in that library, then it may not be noteworthy.  But, if you have scores, dozens, hundreds, or more SAS files in the SAS data library, it can be burdensome to have such information plopped into the middle of the log—especially if you use the DATASETS procedure multiple times in the same program.

The NOLIST option on the PROC DATASETS statement prevents the procedure from printing a library directory in your SAS log.  Here is an example.

This code:

libname ctemp "c:\temp";

proc datasets library=ctemp;
       modify drugs;
            rename subject = patient;
      run;
quit;

Produces the following SAS log:

1    libname ctemp "c:\temp";
NOTE: Libref CTEMP was successfully assigned as follows:
      Engine:        V9
      Physical Name: c:\temp

2    proc datasets library=ctemp;
Directory

Libref         CTEMP
Engine         V9
Physical Name  c:\temp
Filename       c:\temp


Member
#  Name                 Type     File Size  Last Modified

1  BUY                  DATA          5120  19May11:09:53:04
2  CARS                 DATA         74752  14Jun11:16:17:29
3  CLASS                DATA          5120  09Mar12:16:50:44
4  CLASSCAT             CATALOG      13312  13Jan12:11:06:42
5  DRUGS                DATA         13312  09Mar12:16:54:29
6  FINAL                DATA          5120  18Oct10:15:01:04
7  FORMATS              CATALOG    2499584  17Sep07:10:25:50
8  HEART                DATA        902144  09Mar12:16:55:46
9  HEART_BACKUP         DATA        885760  05Jul11:16:08:07
10  MASSZIPS             DATA        377856  20May11:16:15:19
11  MYFMTS               CATALOG      17408  13Feb12:14:34:11
12  ORSALES              DATA        111616  05Jun09:08:30:45
13  PRDSAL2              DATA       2790400  24Jun11:11:26:33
14  SALES                DATA          5120  12Nov09:09:46:31
15  SAMPLE_RESPITE       DATA          9216  19Jul11:11:54:20
16  SAMPLE_RESPITE1      DATA          5120  19Jul11:13:14:25
17  SASMACR              CATALOG      33792  07Nov11:10:06:08
18  SHOEAUD              DATA        132096  12Nov09:10:41:52
19  SHOES                DATA         17408  12Nov09:10:28:26
    SHOES                AUDIT       140288  12Nov09:10:28:26
20  SUMCLASS             DATA          5120  03Dec09:16:14:27
21  ZIPCODE_11Q2_UNIQUE  DATA      33915904  21May11:09:40:21
    ZIPCODE_11Q2_UNIQUE  INDEX       762880  21May11:09:40:20
22  ZIPMIL_11Q2          DATA        492544  21May11:09:40:22
    ZIPMIL_11Q2          INDEX        21504  21May11:09:40:22
23  ZIPMISC_11Q2         DATA         33792  21May11:09:40:22
    ZIPMISC_11Q2         INDEX         9216  21May11:09:40:22
3
4        modify drugs;
5            rename subject = patient;
NOTE: Renaming variable subject to patient.
6        run;

NOTE: MODIFY was successful for CTEMP.DRUGS.DATA.
7    quit;

Interesting enough, but definitely TMI!  The NOTE’s tell the tale that we were successful in renaming SUBJECT to PATIENT and that our modify of the DRUGS data set was successful.  So, we do not really need any additional information about the SAS data library at this time.

We decided that we want to rename PATIENT back to SUBJECT, and this time use the NOLIST option:

proc datasets library=ctemp nolist;

      modify drugs;
            rename patient = subject ;
      run;
quit;

… resulting in this log:

10   proc datasets library=ctemp nolist;
11
12       modify drugs;
13           rename patient = subject ;
NOTE: Renaming variable patient to subject.
14       run;

NOTE: MODIFY was successful for CTEMP.DRUGS.DATA.
15   quit;

Now, that’s exactly what we want: a leaner, cleaner log!

Best of luck in all your SAS endeavors!

----MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)

Excerpt from the book:  Did You Know That?  Essential Hacks for Clever SAS Programmers

I plan to post each and every one of the hacks in the book to social media on a weekly basis.  Please pass them along to colleagues who you know would benefit.