0

in addition to my question about best solution for csv import i search for validation methods where i get qualified errors using msyql only. Short description:

i import data by LOAD DATA INFILE into a temporary table and then i need to validate this data.

Fatest way would be a select with several filters like:

SELECT * from temp_table WHERE col1 not in (1,2) OR col2 REGEXP '[0-9]+' etc.

Works also. But i would like to know fastest way to save the column why the row is found, maybe with case and a counter ort sth. similar?

So that i can then return a qualified error like: "column x can only consist of y"

3
  • 1
    Why not validate the data before loading it into the database? This would be much easier with a script that only sends the appropriate rows. Commented Jul 25, 2013 at 7:01
  • Validate data before inserting it into database . Commented Jul 25, 2013 at 7:02
  • yeah i do this in the momennt, but then i cannot use LOAD DATA INFILE? And i would like to aviod row by row validation because of bad performance like i described in the csv import thread. Commented Jul 25, 2013 at 7:40

1 Answer 1

1

You could put a trigger on the temp_table before loading the data from the CSV-file.

The trigger would either insert a row into the permanent table (if it passes validation) or into an error table if it doesn't. Something like this should work.

DELIMITER //

CREATE TRIGGER validating_insert
AFTER INSERT ON temp_table FOR EACH ROW
BEGIN
  IF col1 NOT IN (1,2) OR col2 REGEXP '[0-9]+' THEN
    INSERT INTO permanent_table VALUES (NEW.col1, NEW.col2);
  ELSE
    INSERT INTO error_table VALUES (NEW.col1, NEW.col2, 'any reason goes here');
  END IF;
END// 
Sign up to request clarification or add additional context in comments.

4 Comments

cool idea, i didnt work with triggers till now. I add the trigger with creating the temp table, right? But in your case i do not get "qualified" errors at which column an error occured.
You can write more advanced statements inside the trigger to get more qualified errors if you need.
do not agree. when you use a trigger which fires for each row you can do it before insert also. Not much difference, you also have overhead with the trigger. I think a better way is doing one select which filteres results, you can remove from the temp table. With where case you can track the column which returns the error.
@Ruven. True, triggers add a cost and it might be faster to do the filtering afterwards. before/after doesn't matter in this case as it will never fail. If you prefer before then go ahead... no difference. The trigger solution can make the logic much easier to read and that might be more important than insert speed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.