Sunday, March 22, 2015

Raiders of the Lost Spreadsheet

SAS Programming Professionals,

Have you ever peered intently into an unfamiliar data delivery directory, realized what was in it, rolled over onto your side, stared blankly into the distance, and dejectedly uttered something akin to:

    "Spreadsheets!  Why did it have to be spreadsheets?"

If so, then we are definitely on the same page.  Why does it always have to be spreadsheets?

The answer to that question is actually pretty obvious when you think about it.  The popularity of Microsoft Office has made Excel one of the most popular mediums for storing data.  It is used extensively in grade schools, middle schools high schools, and colleges.  People with home businesses use it; office administrators use it; clerical staff use it; scientists use it; lawyers use it; hospital workers use it; Federal, state and local government workers use it; and programmers use it too.

An individual who needs to store data in electronic format and then process it may not have SAS, or C++, or JAVA, or C#, or PYTHON, or PHP, or R, or MATLAB, or ColdFusion, or FOCUS, or FORTRAN, or Groovy, or JavaScript, or MOBY, or MUMPS, or NATURAL, or Perl, or PHP, or PL/SQL, or PowerShell, or Python, or S-PLUS, or Visual Basic installed on his or her PC.  But, that person will undoubtedly have Microsoft Office and thus have Excel.  That is why it always has to be spreadsheets.

But, processing data stored in spreadsheets is not really a problem for intrepid SAS programmers.  When I go on a data exploration expedition where there is a good chance of encountering spreadsheets, I pack the usual:  my brown leather jacket, fedora, and bullwhip.  But, most importantly, I put SAS/Access Interface to PC Files into my backpack.

SAS/Access Interface to PC Files is a SAS for Windows product that allows you to read, write, and update data in Excel and Access.  As such, it is a must-have for your Windows SAS installation.

Here is an example of a program that I use to map out the contents of an unexplored spreadsheet:

   ods rtf file="G:\BigProject\Worksheets in NewDataSpreadsheet.rtf";

   libname xlslib "G:\NewProject\DeliveryDirectory\NewDataSpreadheet.xlsx" access=readonly;

   proc sql;
   create table WorkSheets as
   select distinct(compress(MEMNAME,"',$")) as WorkSheet_Name,
       name as ColumnName
   from dictionary.columns
   where libname = 'XLSLIB';

   proc print noobs data=WorkSheets;
   var WorkSheet_Name ColumnName;
   title1 "Workseets in NewDataSpreadheet.xlsx";

   ods rtf close;

The ODS statement specifies that my report will be created as an RTF document.  Because I have SAS/Access Interface to PC Files, the LIBNAME statement allocates the NewDataSpreadheet.xlsx spreadsheet much the same way as it would for a SAS data set.  (Notice that I specified access=readonly so that I do not accidentally update the spreadsheet).  Since I have "LIBNAME-d" the spreadsheet, information about its worksheets and column names is now available in the SAS Dictionary Tables.

I use PROC SQL to extract the name of each worksheet (variable WorkSheet_Name) in the Excel file; and the names of the columns (variable ColumnName) within each worksheet and then plop them into a SAS data set for further exploration. The code snippet compress(MEMNAME,"',$") gets rid of the annoying quotes and dollar signs that are found in spreadsheet MEMNAMEs.  Then, I use the PRINT procedure to create a report.  A simple, neat, quick, and easily macro-tized piece of code.

Here are several good references that you can use to find out more about processing spreadsheets with SAS:

Armed with those resources, some pluck, a sense of adventure, and with your own trusty copy of SAS/Access Interface to PC Files, you too can be a raider of the lost spreadsheet!

Best of luck in all your SAS endeavors!

aka Michael A. Raithel
Amazon Author's Page: