0

I have a database schema in an Oracle database. I also have data dumps from third party vendors. I load their data using sql loader scripts on a Linux machine.

We also have batch updates everyday.

The data is assumed to be free from data errors. E.g. if on the first day a data viz 'A' is inserted into the db and the data 'A' would not occur in the further loading (assumption). If we get a data named 'A' then we get a primary key violation.

Question: To avoid these violations should we build an analyzer to analyze the data errors or are there better solutions.

1 Answer 1

2

I built an ETL system for a company that had daily feeds of flat files containing line of business transaction data. The data was supposed to follow a documented schema but in practice there were lots of different types of violations from day to day and file to file.

We built SQL staging tables containing all nullable columns with bigger than ought to be needed varchars and loaded up the flat file data into these staging tables using efficient bulk-loading utilities. Then we ran a series of data consistency checks within the context of the database to ensure that the raw (staged) data could be cross-loaded to the proper production tables.

Nothing got out of the staging table environment until all of the edits were passed.

The advantage of loading the flat files into staging tables is that you can take advantage of the RDBMS to perform set actions and to easily compare new values with existing values from previous files, all without having to build special flat file handling code.

Sign up to request clarification or add additional context in comments.

1 Comment

Will try this one and let you know if I face any issues.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.