4

I have a requirement where I have to upload a file to db. File will have approx 100K records daily and one per month 8 to 10 million records.

Also there are some field level validations to be performed.

validations are like: are all fields present, do number field contains valid number, date contains valid date, is number in specified range, do the string format match, etc.

There are 3 ways.

1: Upload to temp and then validate
- Create a temp table (all string columns), have extra error column
- upload all entries to temp table
- run validation, populate error column if needed
- move valid entries to correct table

Cons: entries has to be written twice in db, even correct ones.

2: Upload to db directly
- upload all entries directly to table
- check which entries are not uploaded

Cons: would need to read each line even after upload, so as good as double read

3: Validate and then Upload
- read each line, run all validations on all columns
- if valid then write to db

Cons: file reading must be slow than bulk upload to db.

I am writing app in: C# & ASP.NET, DB is Oracle.

Which one of 3 ways is best?

2
  • Validate how, what? That something that's supposed to be numeric doesn't contain non-digits? That a field is within a certain range (not specified by a check constraint)? Option 1 is probably going to require a stored procedure or 'external' code. Option 2 may be runnable just as an SQL statement - there's also some really basic validation you can do on insert (however, it usually terminates immediately if there's an error - not all valid rows would be written). 3 will be particularily slow, and probably needs to be run 'externally'. Commented Jan 9, 2012 at 16:41
  • Validations are like: For num field: check if its number, or its in range or its posetive etc. For text field: check the field format, or its all chars, etc, etc. Commented Jan 9, 2012 at 16:45

2 Answers 2

2

I'll go with option 2.

100k rows are peanuts to bulk and query validation.

Sign up to request clarification or add additional context in comments.

2 Comments

There is one particular upload during the month when upload will have approx 8 to 10 million rows.
The validation must be at SQL side. To read and validate 10 million rows using c# will take a long time. Between option 1 and 2 is your call, but you still have to read all records and validate them.
1

As @aF says, option 2, with the following addition:
Add a table that you can dump 'invalid' rows into. Then, run a statement like this:

INSERT INTO InvalidData
SELECT *
FROM InputData
WHERE restrictedColumn NOT IN ('A', 'B')
OR NOT IS_NUMERIC(numberColumn)  -- I'm assuming some version of SQL Server...

then dump 'validated' rows into your actual table, excluding 'invalid' rows:

INSERT INTO Destination
SELECT a.*
FROM InputData as a
EXCEPTION JOIN InvalidData as b
ON b.id = a.id

The INSERT will fail if any (other) 'invalid' data is encountered, but should be discoverable. The 'invalid' table can then be worked to be cleaned up and re-inserted.

2 Comments

If i understand correctly, i should upload the file to table InputData, then run validations, move invalid rows to InvalidData table & valid rows to Destination. But this would involve like writing each record twice, first to InputData and then to InvalidData/Destination. is there a way to avoid double writing data in db? or this is the best we can achieve?
Most RDBMSs I'm aware of have some sort of import utility, that allows for mass uploads. The problem is that they're usually not able to perform any validation - they do little more than dump the information to those tables - however, they're extremely fast, compared to single (or even normal blocked) inserts. They do fail if they encounter 'invalid for column' data (eg, non-digit characters for a numeric column), but (so near as I know) not much/anything more. So, import to the (all columns char) 'temp' table, then pull rows out.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.