BulkLoading with PostgreSQL available in SAS 9.4
The recent release of SAS 9.4 was very exciting for me because it finally brings PostgreSQL support! In my opinion, PostgreSQL is one of the best low-cost database platforms out there. It has a ton of enterprise level features, it’s fast and highly configurable, and best of all its FREE. Gotta love open source! I’m very happy to see that SAS has invested in PostgreSQL as well. From the looks of it, the Web Infrastructure Platform is now built on PostgreSQL by default in SAS 9.4.
Bulk Loading Big Data with PostgreSQL
For loading large quantities of BIG data into traditional databases, I highly suggest using the bulk load option. Many SAS Access engines support bulk loading. In essence, SAS doesn’t actually load the data directly into the target database. Data is staged into a flat file, then SAS uses the native database client to bulk load into the target database. In this post I will focus only on PostgreSQL.
Configure PostgreSQL Client on Compute Tier
Before you can bulk load with PostgreSQL, SAS needs to have access to a PostgreSQL client installed where ever SAS is running (typically a server, aka compute tier). For PostgreSQL, the COPY command is used for bulk loading. If you’re like me, you want to know what runs behind the scenes, well that’s it!
If you love and run SAS on Ubuntu, its easy to get the PostgresSQL client with this command 🙂
sudo apt-get install postgresql-client
For other distributions of Linux/Unix operating systems and Windows, the PostgreSQL Client can be downloaded and installed by your IT systems administrator.
Bulk Loading with Insert or Append
In this example snippet of code, I’m inserting random sample data used to demonstrate concepts in my SAS Global Forum 2013 paper. Prior steps in the code stage up the data to fit the schema of the target table (fact_transactions).
insert into defiant.fact_transactions
select * from stg_fact_transactions;
Common Bulk Loading Parameters
- bulkload – obviously tells SAS to use the bulkloader (YES/NO)
- bl_datafile – specify a filename on the compute tier which is used to stage the data for bulk loading
- bl_logfile – specify a log file in case the native bulk load utility from the ‘psql’ command returns anything
- bl_delete_datafile – tells SAS to delete the temporary data after everything is done
You can also use PROC APPEND to accomplish the same. Both methods accomplish the same thing in about the same amount of time. Use your own discretion or preference.
proc append base=defiant.fact_transactions
I must give the usual disclaimer, mileage may vary for different environments. Depending on how much data is being loaded and how the target database is configured, I’m 99.9% confident bulk loading will improve load times for BIG DATA. Check out postgresql.org for more performance suggestions on populating a database.
Never miss a BI Notes post!
Click here for free subscription. Once you subscribe you'll be asked to confirm your subscription through your email account. You email address is kept private and you can unsubscribe anytime.
Latest posts by Steve Overton (see all)
- BulkLoading with PostgreSQL available in SAS 9.4 - 2013-10-22
- Everything Still Starts and Ends with Data in SAS Visual Analytics - 2013-07-30
- Describe Your Table in SAS to Write the SQL Code - 2013-05-22
- Update Table Metadata Using Base SAS Code - 2012-12-19
- SAS Administration: Fetch Data Faster across ODBC - 2012-11-01