Sunday, March 22, 2015

Raiders of the Lost Spreadsheet

SAS Programming Professionals,

Have you ever peered intently into an unfamiliar data delivery directory, realized what was in it, rolled over onto your side, stared blankly into the distance, and dejectedly uttered something akin to:

    "Spreadsheets!  Why did it have to be spreadsheets?"

If so, then we are definitely on the same page.  Why does it always have to be spreadsheets?

The answer to that question is actually pretty obvious when you think about it.  The popularity of Microsoft Office has made Excel one of the most popular mediums for storing data.  It is used extensively in grade schools, middle schools high schools, and colleges.  People with home businesses use it; office administrators use it; clerical staff use it; scientists use it; lawyers use it; hospital workers use it; Federal, state and local government workers use it; and programmers use it too.

An individual who needs to store data in electronic format and then process it may not have SAS, or C++, or JAVA, or C#, or PYTHON, or PHP, or R, or MATLAB, or ColdFusion, or FOCUS, or FORTRAN, or Groovy, or JavaScript, or MOBY, or MUMPS, or NATURAL, or Perl, or PHP, or PL/SQL, or PowerShell, or Python, or S-PLUS, or Visual Basic installed on his or her PC.  But, that person will undoubtedly have Microsoft Office and thus have Excel.  That is why it always has to be spreadsheets.

But, processing data stored in spreadsheets is not really a problem for intrepid SAS programmers.  When I go on a data exploration expedition where there is a good chance of encountering spreadsheets, I pack the usual:  my brown leather jacket, fedora, and bullwhip.  But, most importantly, I put SAS/Access Interface to PC Files into my backpack.

SAS/Access Interface to PC Files is a SAS for Windows product that allows you to read, write, and update data in Excel and Access.  As such, it is a must-have for your Windows SAS installation.

Here is an example of a program that I use to map out the contents of an unexplored spreadsheet:

   ods rtf file="G:\BigProject\Worksheets in NewDataSpreadsheet.rtf";

   libname xlslib "G:\NewProject\DeliveryDirectory\NewDataSpreadheet.xlsx" access=readonly;

   proc sql;
   create table WorkSheets as
   select distinct(compress(MEMNAME,"',$")) as WorkSheet_Name,
       name as ColumnName
   from dictionary.columns
   where libname = 'XLSLIB';
   quit;

   proc print noobs data=WorkSheets;
   var WorkSheet_Name ColumnName;
   title1 "Workseets in NewDataSpreadheet.xlsx";
   run;

   ods rtf close;

The ODS statement specifies that my report will be created as an RTF document.  Because I have SAS/Access Interface to PC Files, the LIBNAME statement allocates the NewDataSpreadheet.xlsx spreadsheet much the same way as it would for a SAS data set.  (Notice that I specified access=readonly so that I do not accidentally update the spreadsheet).  Since I have "LIBNAME-d" the spreadsheet, information about its worksheets and column names is now available in the SAS Dictionary Tables.

I use PROC SQL to extract the name of each worksheet (variable WorkSheet_Name) in the Excel file; and the names of the columns (variable ColumnName) within each worksheet and then plop them into a SAS data set for further exploration. The code snippet compress(MEMNAME,"',$") gets rid of the annoying quotes and dollar signs that are found in spreadsheet MEMNAMEs.  Then, I use the PRINT procedure to create a report.  A simple, neat, quick, and easily macro-tized piece of code.

Here are several good references that you can use to find out more about processing spreadsheets with SAS:

Armed with those resources, some pluck, a sense of adventure, and with your own trusty copy of SAS/Access Interface to PC Files, you too can be a raider of the lost spreadsheet!

Best of luck in all your SAS endeavors!

----MMMMIIIIKKKKEEEE
aka Michael A. Raithel
Amazon Author's Page:
http://www.amazon.com/Michael-A.-Raithel/e/B001K8GG90/ref=ntt_dp_epwbk_0 






Sunday, March 15, 2015

What is a SAS Index Good for Anyhow?


 SAS Programming Professionals,

If there was a SAS performance tool that could drastically reduce your program’s I/O’s, lower its CPU time, and decrease its run time, would you use it?  Of course you would!  Such a performance tool exists; it is called a SAS index.  SAS Indexes can dramatically improve the performance of programs that access small subsets of observations from large SAS data sets.  They do this by only accessing and returning the observations that you specify in a WHERE expression, instead of reading the entire SAS data set.

It is easy to understand how a SAS index can help you to directly access the observations that you need in a particular SAS data set.  As an exercise, do the following:
  • Open support.sas.com in a web browser
  • Type "rtrace" in the search window at the top of the page
  • Click on the Search button
The support.sas.com search function returns about seventy links.  When you click on any of those links, you get a page in the documentation that discusses the SAS RTRACE facility.  This saves you the tedious effort of going through the entire SAS Online documentation, page-by-page, looking for occurrences of the word “rtrace”.

A SAS index is analogous to the search function discussed above.  A good index allows your programs to quickly access the subset of SAS observations that you need from a large SAS data set when you specify a key variable value (or values) that must be matched.  This can dramatically improve the speed and efficiency of your SAS programs.

Conversely, badly conceived SAS indexes return far too many observations and are no better than reading the entire data set sequentially.  In the analogy, above, consider how many pages would be returned and how much longer it would take if you searched the SAS Online Documentation for the word “SAS”.  That is why it is important to know more about the selection criteria for index variables, as well as the actual creation and use of SAS indexes.

After deciding that an index is appropriate for your subsetting purposes, you have three tools to choose from to create one: 
  1. The DATASETS procedure, 
  2. The SQL procedure, 
  3. The DATA option in the DATA step or in a Procedure.  
When you first use one of these tools to create an index, SAS creates a separate index file and associates it with your SAS data set.  The index file has the same name as the original SAS data set, but has a suffix of ".sas7bndx."  SAS stores additional indexes in that file and deletes the file when all indexes have been removed from the data set.

You can create a Simple index from a single variable, or a Composite index from two or more variables.  A SAS data set can have as many indexes as you think are necessary.

You can exploit indexes with the WHERE statement, the BY statement, or with the KEY statement used in conjunction with either a SET or MODIFY statement.  In doing so, you will be increasing the efficiency of your SAS programs that use the index.  That is what SAS indexes are good for!

There is enough information about SAS indexes to fill an entire book.  Here are a few resources to consider if you are interested in learning more about SAS indexes:

If you find that your programs are consistently extracting very small subsets of observations from very large SAS data sets, then SAS indexes just might be the right tool for you.

Best of Luck in all your SAS endeavors!  

----MMMMIIIIKKKKEEEE 
aka Michael A. Raithel 

Amazon Author's Page: 

Sunday, March 8, 2015

I Know What You Did Last Summer!

SAS Programming Professional,

I know what you did last summer.

If it was unintentional, then you probably don't know what I am talking about.  If it was intentional, then you probably thought that I would never find out.  Either way, the damage is done.  The actions that you took on that warm summer evening are as clear to me now as they would have been if I had been watching over your shoulder while you did them.  I know what you did last summer: You updated one of my SAS data sets.

We work on the same project  and both have read, write, update, and delete rights to the project's directories.  The production SAS data set that I created for the spring data delivery was inexplicably updated in the summer.  And, you were the one who did it.  Because we have been teammates for a while, I am giving you the benefit of the doubt.  I bet that you made a copy of the production SAS program for a different use, updated it, but forgot to change the LIBREF to point to your test SAS data library.  So when you ran it, you accidentally deleted 400 observations and updated 273 observations in the production data set.

Oh, you want to know how I determined it was you and how I know exactly what changed.

Well, because that production data set is very important, I used PROC DATASETS to create a SAS audit trail file for it.  SAS audit trails record changes to SAS data sets.  They can record the before and after image of observations that were changed, the type of change, the date/time of the change, and the userid of the person who changed the SAS data set.  So, SAS audit trails can be very useful in a shared directory environment where many staff members have access to important SAS data sets.

Here is the code I used to create the audit trail for the production SAS data set:

proc datasets library=prodlib nolist;
        audit SpringDeliveryData;
        initiate;
        log admin_image=yes
              before_image=yes
              data_image=no
              error_image=yes;
        run;
quit;

When I executed that DATASETS procedure code, SAS created a file named SpringDeliveryData.sas7baud in the same directory as the SAS data set.  When an observation is updated, added, or deleted from SpringDeliveryData, SAS writes an observation to the audit trail data set containing the variables in the original SAS data set and six specific audit trail variables.  Of note are _ATDATETIME_ which specifies the date/time of the change; _ATOPCODE_ which specifies the type of change that took place--e.g. add, delete, modify; and _ATUSERID_ which specifies the userid of the person whose SAS program made the change.

When I noticed that SpringDeliveryData had been modified, I used a PROC PRINT to dump the audit trail file.  That is how I know that the data set was updated at 5:27 PM on August 5th by a program submitted under your userid.

You are interested in using SAS audit trails for your own production SAS data sets?  Great!  You can find a comprehensive write-up in the documentation on support.sas.com at this link:

http://support.sas.com/documentation/cdl/en/lrcon/67885/HTML/default/viewer.htm#n0ndg2uekz7qkbn1caoki2hzqx8l.htm

Don't fret about the updates to the SpringDeliveryData SAS data set.  I am going to request that our systems administrator restore the data set to the day before the summer update.  That way, we will have the original data set available in case our client has questions about it.

Good to know that I was right that you accidentally updated the production data set last summer.  Oh, don't go.  Unfortunately we have one more thing to talk about:

I know what you did last fall...

Best of luck in all your SAS endeavors!

----MMMMIIIIKKKKEEEE
aka Michael A Raithel
Amazon Author's Page:  http://www.amazon.com/Michael-A.-Raithel/e/B001K8GG90/ref=ntt_dp_epwbk_0

Sunday, March 1, 2015

Excuse Me; But, Is That a 32-Bit or a 64-Bit SAS Catalog?

SAS Programming Professionals,

I don’t know about you, but I get pretty determined to prove them wrong when people tell me that I cannot do something.  I am not talking about fantastical things such as flying through the heart of the sun and out the other side without getting burned.  Nor, am I talking about social things like becoming president of the United States or an author on the New York Times Bestseller list.  And, I am not talking about physical things such as swimming 2.4 miles, biking 112 miles, and running 26.2 miles back-to-back on the same day.  No, I am talking about being told that I cannot do something with SAS.

For example, I was once told:



      Consequently, when I was told that there was no SAS facility for programmatically determining whether a Windows SAS catalog was a 32-bit catalog or a 64-bit catalog, I resolved to figure out a way to do it.

The background is that my organization plans to migrate from 32-bit SAS to 64-bit SAS as part of a SAS 9.3 to SAS 9.4 upgrade.  SAS data sets are compatible between the two bitages, but SAS catalogs are not.  Stating the problem: you cannot open a 64-bit SAS catalog with 32-bit SAS.  So, it is advantageous to have a tool for determining which SAS catalog is which bitage as you move forward into a mixed-bit programming environment during the transition.

I did my due diligence and researched every place that I thought I might be able to find a way to differentiate the bitage.  An indicator in PROC CATALOG if I ran it with the STAT option enabled?  Nope.  Something in the directory portion of a PROC CONTENTS listing with the DETAILS option specified?  Nope.  A lesser-known option of PROC DATASETS?  Nope.  How about a flag in the Dictionary Tables CATALOGS table or in the SASHELP Views VCATALG view?  Nope.  A Usage Note on support.sas.com.  Nope.  A  SAS technical paper published at either SAS Global Forum or a Regional SAS Users Group?  Nope, not that either.

I figured that if you could not tell the difference within SAS, itself, how about if you looked at the catalogs as simply files.  So, I got a 32-bit SAS catalog and a 64-bit SAS catalog and opened them with WordPad to take a look inside.  Bingo!  There was enough information in the very first record of both catalog files to determine the difference.  So, I wrote a program that tested for the string of characters that told the tale.

Here is the SAS program that I wrote:

/*Macro to determine bitage of a SAS catalog */

%MACRO Test_Cat_Bitage(SASCatalog);

filename sascat "&SASCatalog";

data decompcat(keep=CAT_BITS SAS_Catalog);

length CAT_BITS $8
         SAS_Catalog $50;

infile sascat obs=1 truncover;

input bigline $charzb32767. ;

if index(bigline, "W32_7PRO") > 0 then CAT_BITS = "W32_7PRO";
      else if index(bigline, "X64_7PRO") > 0 then CAT_BITS = "X64_7PRO";
      else CAT_BITS = "Unknown ";

SAS_Catalog = strip("&SASCatalog");

label CAT_BITS    = "Bitage of SAS Catalog"
        SAS_Catalog = "Full Path SAS Catalog Name"
        ;

run;

proc append base=AllCatalogs
                  data=decompcat;
run;

%MEND Test_Cat_Bitage;

/* Example of executing the macro to read a catalog file */

%Test_Cat_Bitage(c:\temp\gender.sas7bcat);

As you can see, the program determines the bitage of a SAS catalog by treating the catalog as a file, not as a catalog.  It opens the catalog file and inspects the first line for a specific character string: W32_7PRO for 32-bit catalogs; X64_7PRO for 64-bit catalogs.  Once it determines the bitage, the program writes an observation to data set AllCatalogs in the WORK library.  Each observation in AllCatalogs has two variables:  CAT_BITS, which specifies whether the catalog is 32 or 64 bits, and SAS_Catalog, which is the full path name of the SAS catalog file.  

The object of this particular setup is to run the macro against several, a score, dozens, hundreds, or thousands of SAS catalogs and build a SAS data set which identifies their bitage.  After that, one may choose to copy AllCatalogs to a permanent SAS data set, or create a report from it.  Or both.

Being a talented SAS programmer yourself, I would bet that you also do not like it when people tell you that you cannot do something with SAS.  Right?  Yea, it goes with the territory.  How about posting a comment telling us about a particularly difficult SAS problem you encountered and the clever way that you resolved it?  Bet you can’t do that.

Best of luck in all your SAS endeavors!

----MMMMIIIIKKKKEEEE
(aka Michael A Raithel)
Amazon Author's Page:  http://www.amazon.com/Michael-A.-Raithel/e/B001K8GG90/ref=ntt_dp_epwbk_0