BI Tools

Tips and tricks for building information maps, OLAP cubes, reports, and dashboards

BI Admin

Learn your way around a SAS BI installation.

Visual Analytics

Learn your way around the SAS Visual Analytics tool

Coding & Data

Extract, transform, and load your data into the SAS BI toolset

Stored Processes

Create and design stored processes like a rock star

Home » BI Admin, Coding & Data

SAS Administration: Process Data Faster in RDBMS by Buffering the Data in Memory

Submitted by on 2012-10-03 – 8:37 AM 9 Comments

By default, accessing third party relational databases can be very slow if not configured properly.  I recently started using PostgreSQL 9.1, an open source database, to store high volumes of data for my SAS Global Forum 2013 paper.  In this example I am using an ODBC connection to connect to a database I defined in Postgres.  Native SAS/ACCESS engines to databases such as Oracle or MS SQL Server will have different options and abilities.  At first it was taking forever to load up data because SAS was inserting 1 row at a time into the database table.  After adding a simple option my data processing was off to the races!

RDBMS Buffering Options

The SAS INSERTBUFF and READBUFF options will improve ODBC and OLE DB libraries dramatically.   By default these are set to 1 and 250 rows respectively for ODBC connections.  Setting these buffer sizes tells SAS how many rows to buffer in memory before processing.

Using the LIBNAME Option

These options can be added to the LIBNAME statement to set the buffering sizes for all processing done on tables within the library.  Ideally if you have the SAS Metadata server running, your SAS Administrator should set these options through the Data Library manager in SAS Management Console.

If you are using Base SAS or writing code in SAS Enterprise Guide, you can also manually write the LIBNAME step like this:

LIBNAME pgsgf13 ODBC  DBCOMMIT=10000  READBUFF=30000 INSERTBUFF=30000  DATASRC=sasgf13  SCHEMA=public ;

Be sure to check out SAS support for more information on the INSERTBUFF and READBUFF options for the LIBNAME statement.

Using a SAS Data Set Option

You can also explicitly define these buffer options for an individual data step in your code if you want.   This may come in handy depending on the type, size and width of data you plan on inserting.

LIBNAME pgsgf13 ODBC DATASRC=sasgf13 SCHEMA=public ;
data pgsgf13.test(dbcommit=10000 insertbuff=10000 readbuff=10000);
*** DATA STEP STUFF ****;
run;

Be sure to check out SAS support for more information on the INSERTBUFF and READBUFF options for the data step.

Considerations

Careful consideration must be taken into account when setting these options.  The optimal setting depends on your SAS compute server resources and network capacity.  The number of rows to buffer should be much less for very wide tables with lots of character data because of the physical byte sizes of character columns and the overall width of the table.  In my project I am using very skinny fact tables with numeric data, which requires only 8 bytes per column of numeric data.  Assuming I have 10 numeric columns, that’s only about 80 bytes of data per row.  For my data step which creates a huge volume of data, I could theoretically set the INSERTBUFF much larger, but SAS does have a hard limit of approximately 32,000 rows it can buffer in memory 🙂 .

Learn all about SAS BI

You can learn more about SAS Business Intelligence from the "SAS BI Bible." Take a peek inside the Building Business Intelligience with SAS book.

The following two tabs change content below.
Spread the love

Tags: , ,

9 Comments »

  • Thanks SASKiwi …. good information!

  • SASKiwi says:

    Actually for ODBC Pass-through at least, READBUFF is a supported option. Check the SAS documentation. I use it all the time and tend to just set the maximum allowed value of 32767. This appears to work well with most tables that do not have large numbers of columns.

  • Sounds cool .. any more details?
    Is it open source?

    What kind of speed have you seen?

  • Hello says:

    Pass Through queries do not use a libname as it submits the code directly to the Database in question… so you are not using a libname statment as that is a SAS thing!!! There however may be connections strings items that you may be able to pass in your connect string to improve the performance of data pull.

  • Hello says:

    You should take a look at Hyperstage to see what a real fast database can do. Something to look at if you want real fast load and retrieval!!!

  • Steve Overton says:

    Agreed, the SAS administrator should definitely manage these settings through the library definition in metadata if you do not know how to best configure these settings. This tip comes from my experience doing development work on my own server. In the real world this will be very different!

  • Chris S. says:

    What I never got was how you were supposed to figure out how to set these parameters. I suppose you could complete a bunch of tests and vary the numbers systematically, in the end selecting the best response time for say 100,000 records. Seems a bit complicated for on analyst to complete, and it should really be done by an administrator and set or suggested to all users.

  • Steve Overton says:

    These options will not work for pass-through queries since SAS passes the SQL code directly to the third-party database, Oracle in your case. Also, these commands tell SAS to buffer data before sending to the target database. Passing through code bypasses this altogether.

  • Quentin says:

    Nice post, and comes at a good time for me as I just started working on a DI studio job that will do big nightly pulls from an Oracle database.

    I’m using a pass-through query. Assume these options work for pass-through queries as well?

    Thanks!
    –Quentin