Fix number of arguments / parameters within parentheses of SQL script

Question

I have a large SQL script file generated from XML. In some value insert rows, a incorrect number of values is passed, causing the error Column count doesn't match value count. I'd like to track down possible syntax errors. As the SQL script file contains 300k rows, I'd like to write a script for that.

Is there any way to check the numbers of values within a statement like:

INSERT INTO table (
 one,
 two,
 three
)
VALUES (123, 'lorem', 'ipsum');

Any help is greatly appreciated.

I had different question but of a similar nature not too long ago, try taking a look and see if you can piece something together from this stackoverflow.com/questions/38807810/… @glennjackman yup, string values should be surrounded by a single quote in SQL. — Jacek Trociński
– Jacek Trociński, Commented Nov 14, 2016 at 18:53

James K. Lowden · Accepted Answer · 2016-11-15 00:21:18Z

0

SQL is very difficult to parse. If your data is pretty simple and your SQL is pretty regular, you might be able to get away with using awk in the way you're hoping, see next. Personally, I would probably inspect the database for inserted values, and scan the script for them, or vice-versa. Or insert a bunch of print statements and see where the error message is interposed.

Hoping for the best in awk, let's give it the old college try:

$ cat dat
INSERT INTO table (
 one,
 two,
 three
)
VALUES (123, 'lorem', 'ipsum');
INSERT INTO table (
 one,
 three
)
VALUES (123, 'lorem', 'ipsum');

$ tr -d \\n < dat | sed 's/;/&\
/g' | awk -F '[()]' 'split($2, cols, /, /) != split($4, vals, /, /) {print}'
INSERT INTO table ( one, three)VALUES (123, 'lorem', 'ipsum');

With tr, we delete the newlines. With sed, we put each SQL statement (ending with ;) on a line. With awk, we split each line using parentheses as delimiters, so that the columns are in $2 and the values are in $4. The split command returns how many fields each of them has, using the comma as a delimiter in both cases. If they don't match, print the line. The last line displayed is the output, because the column name two is missing.

This could return some false positives, which in your case might not be terrible. If the data has semicolons or commas, the splitting will be wrong. If the INSERT doesn't mention column names, it will be wrong. If there are non-insert statements, you'll have to filter them out, or deal with them differently.

answered Nov 15, 2016 at 0:21

James K. Lowden

7,8551 gold badge19 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Leander Over a year ago

Thanks a lot! This really helped. I used both of your suggestions. I had to split the INSERT statements as some of them had a size of ~3000 lines.

Collectives™ on Stack Overflow

Fix number of arguments / parameters within parentheses of SQL script

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related