BI Tools

Tips and tricks for building information maps, OLAP cubes, reports, and dashboards

BI Admin

Learn your way around a SAS BI installation.

Visual Analytics

Learn your way around the SAS Visual Analytics tool

Coding & Data

Extract, transform, and load your data into the SAS BI toolset

Stored Processes

Create and design stored processes like a rock star

Home » Coding & Data, General

BulkLoading with PostgreSQL available in SAS 9.4

Submitted by on 2013-10-22 – 8:04 AM One Comment

The recent release of SAS 9.4 was very exciting for me because it finally brings PostgreSQL support!  In my opinion, PostgreSQL is one of the best low-cost database platforms out there.  It has a ton of enterprise level features, it’s fast and highly configurable, and best of all its FREE.  Gotta love open source!  I’m very happy to see that SAS has invested in PostgreSQL as well.  From the looks of it, the Web Infrastructure Platform is now built on PostgreSQL by default in SAS 9.4.  

PostgreSQL Elephant

 

Bulk Loading Big Data with PostgreSQL

For loading large quantities of BIG data into traditional databases, I highly suggest using the bulk load option.  Many SAS Access engines support bulk loading.  In essence, SAS doesn’t actually load the data directly into the target database.  Data is staged into a flat file, then SAS uses the native database client to bulk load into the target database.   In this post I will focus only on PostgreSQL.

Configure PostgreSQL Client on Compute Tier

Before you can bulk load with PostgreSQL, SAS needs to have access to a PostgreSQL client installed where ever SAS is running (typically a server, aka compute tier).  For PostgreSQL, the COPY command is used for bulk loading.  If you’re like me, you want to know what runs behind the scenes, well that’s it!

If you love and run SAS on Ubuntu, its easy to get the PostgresSQL client with this command 🙂

sudo apt-get install postgresql-client

For other distributions of Linux/Unix operating systems and Windows, the PostgreSQL Client can be downloaded and installed by your IT systems administrator.  

Bulk Loading with Insert or Append

In this example snippet of code, I’m inserting random sample data used to demonstrate concepts in my SAS Global Forum 2013 paper.  Prior steps in the code stage up the data to fit the schema of the target table (fact_transactions).

proc sql;
insert into defiant.fact_transactions
(bulkload=yes
bl_datafile="/tmp/data"
bl_logfile="/tmp/bulkload_log"
bl_delete_datafile=NO)
select * from stg_fact_transactions;
quit;
Common Bulk Loading Parameters
  • bulkload – obviously tells SAS to use the bulkloader (YES/NO)
  • bl_datafile – specify a filename on the compute tier which is used to stage the data for bulk loading
  • bl_logfile – specify a log file in case the native bulk load utility from the ‘psql’ command returns anything
  • bl_delete_datafile – tells SAS to delete the temporary data after everything is done

You can also use PROC APPEND to accomplish the same.  Both methods accomplish the same thing in about the same amount of time.  Use your own discretion or preference.

proc append base=defiant.fact_transactions
( bulkload=yes
bl_datafile="/tmp/data"
bl_logfile="/tmp/bulkload_log"
BL_LOAD_METHOD=APPEND
)
data=stg_fact_transactions;
run;

I must give the usual disclaimer, mileage may vary for different environments.  Depending on how much data is being loaded and how the target database is configured, I’m 99.9% confident bulk loading will improve load times for BIG DATA.  Check out postgresql.org for more performance suggestions on populating a database.

Never miss a BI Notes post!

Click here for free subscription. Once you subscribe you'll be asked to confirm your subscription through your email account. You email address is kept private and you can unsubscribe anytime.
 
 
The following two tabs change content below.

Tags:

One Comment »

  • Andrew Dunstan says:

    I usually recommend installing the community’s distribution rather than the one from Ubuntu. See the Postgres download page for details.