Sunday, June 17, 2007

ETL

ETL is the process of Extraction, Transformation and loading. In fact it is a fundamental phase in creating and maintaining Operational Data Stores (ODS), data marts and data warehouses.
The are tools in the market that helps in executing this phase, like informatica and Oracle Warehouse Builder (OWB). Sometimes, and for non complex ETL tasks, one might choose to run the run SQL script manually.
Whether this is done for the above mentioned objective, or any other objective, ETL could potentially be a time consuming processes especially when the data being loaded is not clean.
Foregin key violation, Primary key violation, missing data, invalid data etc.
When you run SQL script against Oracle database, any of the above violation causes the termination of the script and the rollback of the entire statement. The part that is frustrating is that the Rollback process can talk really long time to complete and undo the changes. You are expected to wait for the rollback to complete, then fix the problem and run again. This time the statement could fail for another violation and you are again expected to repeat the cycle. If you are working with small data sizes, this could be tolerable, but when you are loading millions of records, this could really have an adverse effect on the project.
Oracle 10g Rel 2 offers a new feature that allows you to continue loading your data (albeit of the errors) while keeping the violating rows in a separate ERROR LOG table. This give you the chance to only examine the violating rows while not having to repeat the same cycle with every error

The following two statement are needed

dbms_errlog.create_error_log('REALSOFT','ERROR_LOG_REALSOFT') ;

ERROR_LOG_REALSOFT REJECT LIMIT UNLIMITED;

I have included an exhaustive and simple example
please visit http://www.e-ammar.com/Oracle.htm

No comments:

Getting Ref of the View Object referenced by the current Iterator binding for One iterator page without knowing the name of the iterator

Getting Ref of the View Object referenced by the current Iterator binding for One iterator page without knowing the name of the iterator ...