5

Intro note: I'm hoping a library or routine exists to do this, but I haven't been able to find anything like this. I'm really looking for direction and advice on where to start...

Here is the situation: I have a block of SQL commands coming as plain text. It might be one SQL commands or several. I need a way to split multiple SQL commands so I can run them one at a time. Microsoft SQL Management Studio does has this behavior out of box.

I'm trying to add this functionality to a PHP5/MySQL5 application running on Apache (Debian).

Some important points:

  1. I really do need to run them one at a time. Seriously.
  2. I don't want to require the user to enter a semi-colon after each SQL statement.
  3. SQL statements can be on one or multiple lines, so I can't wrap on LBs/CRs
  4. It needs to support SELECT, UPDATE, INSERT, DELETE at least.
  5. It needs to support queries that are sub-selects
  6. Neatly tabbed SQL needs to work
  7. (In the interest of usable software) I do not want to force the user to enter in any kind of delimiter.

Here is an example block of SQL I need to split into two statements:

select sMessage, 
(
    SELECT COUNT(sTag) FROM Tags WHERE ixTicket = note.ixTicket
) FROM note
select * from ticket
    WHERE (SELECT MAX(nCount) FROM Counter WHERE ixTicket = ticket.ixTicket) > 5

I tried some RegEx attempts, but that doesn't seem to be powerful enough.

Any recommendation on an approach to tackle this problem?

4
  • Point 7 is really making this a bear ... even Oracle and SQL Server require some type of delimiter between statements (;, GO, \, etc). This is EXCEEDINGLY difficult w/o a delimiter (e.g. just think about all the places a SELECT can go - plus you have UNIONs and similar statements to contend with Commented Mar 11, 2009 at 3:34
  • BTW - SQL Server Management studio does require a "GO" or ";" between multiple statements ... just not for 1. You're asking for an easy way to do something so difficult that even MS and Oracle don't provide it. Commented Mar 11, 2009 at 5:17
  • @LuckLindy : "SQL Server Management studio does require a "GO" or ";" between multiple statements ... just not for 1." That is actually wrong. Just FYI. Open SQL Studio and check it out. Commented Apr 24, 2009 at 2:09
  • See also stackoverflow.com/questions/1883079/… Commented Jan 6, 2010 at 7:25

11 Answers 11

3

I'm not sure this is possible at all. You would certainly need an in-depth knowledge of the SQL syntax of your target DBMS. For example just off the top of my head this is a single MySQL statement:

INSERT INTO things
SELECT * FROM otherthings ON DUPLICATE KEY
UPDATE thingness=thingness+1

It is likely there are constructs in some DBMSs that, without a delimiter, could be ambiguous.

I don't want to require the user to enter a semi-colon after each SQL statement.

I think you may be forced to. It's totally the standard way to delimit SQL statements. Even if you can find a heuristic to detect probably-start-of-SQL-statement points, you risk disasters like an accidental “DELETE FROM things”-without-WHERE-clause.

SQL statements can be on one or multiple lines, so I can't wrap on LBs/CRs

Would double-newline-for-new-statement be acceptable?

I tried some RegEx attempts, but that doesn't seem to be powerful enough.

No, even with semicolon delimiters, regex is nowhere near powerful enough to parse SQL. Problem points would include:

';'
";"
`;`
'\';'
''';'
-- ;
#;
/*;*/

and any interposition of these structures. Eek!

Sign up to request clarification or add additional context in comments.

1 Comment

All good points, but I don't want to require a delimiter. It's very possible and safe to parse it without, just look @ SQL Management Studio. I didn't say it was going to be easy.
1

Maybe try this library. I have used it successfully for parsing sql in the past. http://www.sqlparser.com/

1 Comment

I'll check this out. I need to do this in code, so I'm not sure that would work, but I'll look tomorrow.
1

To add a quirk to the discussion that periodically causes issues:

DECLARE c CURSOR FOR
    SELECT * FROM SomeWhere ...
        FOR UPDATE

The trailing UPDATE tends to throw ad hoc parsers off their stride. It may well be that you don't have to worry about that because the DECLARE notation (which is really Embedded SQL, not plain SQL) is not permitted in the first place. But the FOR UPDATE clause can appear in some dialects of SQL even when not in a DECLARE statement, so beware.

Comments

1

maybe with the following Java Regexp? check the test...

@Test
public void testRegexp() {
    String s = //
        "SELECT 'hello;world' \n" + //
        "FROM DUAL; \n" + //
        "\n" + //
        "SELECT 'hello;world' \n" + //
        "FROM DUAL; \n" + //
        "\n";

    String regexp = "([^;]*?('.*?')?)*?;\\s*";

    assertEquals("<statement><statement>", s.replaceAll(regexp, "<statement>"));
}

Comments

1
    $sMultiQuery = 'SHOW TABLES; SELECT * FROM `test`';
    $aQueries    = array();

    if ( preg_match_all('/([^;]*?((\'.*?\')|(".*?"))?)*?(;\s*|\s*$)/', $sMultiQuery, $aMatches) )
    {
        $aQueries = $aMatches[0];
    }
    else
    {
        $aQueries = array($sMultiQuery);
    }

    foreach ( $aQueries as $sQuery )
    {
        # Do your thing
    }

Comments

1

If anybody needs a javascript / typescript function to split SQL statements here you go

const splitSqlStatements=(statements:string):string[]=>{
    const split:string[]=[];
    let inside:'"'|"'"|'-'|null=null;
    let s=0;
    for(let i=0;i<statements.length;i++){
        const char=statements[i] as string;

        if(inside){
            if(inside==='-'){
                if(char==='\n'){
                    inside=null;
                    s=i+1;
                }
            }else if(char===inside){
                if(statements[i+1]===inside){
                    i++;
                }else{
                    inside=null;
                }
            }
        }else if(char==='"'){
            inside='"';
        }else if(char==="'"){
            inside="'";
        }else if(char==='-' && statements[i+1]==='-'){
            const v=statements.substring(s,i).trim();
            if(v){
                split.push(v);
            }
            inside='-';
            i++
        }else if(char===';'){
            const v=statements.substring(s,i).trim();
            if(v){
                split.push(v);
            }
            s=i+1;
        }
    }
    if(s!==statements.length-1 && inside!=='-'){
        const v=statements.substring(s).trim();
        if(v){
            split.push(v);
        }
    }
    return split;
}

Comments

0

Your best bet is to require the user to put some type of deliminator between statements. For example: require each statement be delineated with a line containing only the word GO, or a "\", or end each statement with a ";".

This way you can easily break the single string into separate SQL statements.

1 Comment

Added point 7. Forgot to mention I definitely don't want delimiter.
0

If you don't want your users to put in a delimiting character such as ';' or any thing else, you will need to parse the input yourself and have logic to determine where statements begin.

Your logic will need to deal with the obvious query starting keywords 'SELECT', 'UPDATE', 'INSERT', 'DELETE' and work forward to the next keyword (or end of input).

2 Comments

I have recently just worked on a SQL Parser. Initially I thought like you did that it should be relatively straight forward but don't be fooled. Even with the help of a SQL Parser third party component I still had to write 600 lines of code to do some pretty simple parsing.
Yah, I know I can write my own routine to do this. But it's hideous every time I try and fails.
0

Have you tried using the keywords 'SELECT', 'UPDATE', 'INSERT' and 'DELETE' combined with counting the number of opening '(' and closing braces ')' ?

This should allow you to determine avoid nested SELECT statements and find the correct end of the statement.

1 Comment

Yes, I did. The code got hideous and long, and I constantly found use cases that broke it, so I figured there must be a more flexible way.
0

You need to require the semicolon delimiter. Technically, without it a SQL statement is completely invalid; anyone omitting it is writing malformed SQL. Requiring the semicolon solves all of your problems, in a standardized way, and makes the software easy to write.

Perhaps do the following: if the user enters a query not containing one or more semicolons (outside of quotes, of course), add a semicolon at the end and run it as a single query. Otherwise, split the entered queries at semicolons and run each one individually, perhaps tacking on a semicolon at the end of the final query if omitted.

This solution is easy to write, SQL standard compliant, and just plain works. Not requiring the delimiter is a sure path to madness.

7 Comments

"Not requiring the delimiter is a sure path to madness." I totally disagree. That's one of the best features in MS SQL Management Studio.
If LuckyLindy's comment above is to be believed, even SQL Management Studio uses the approach I describe. Not requiring delimiters will require you to write a full SQL parser, as complex as the server's itself. Don't do it. 'Saving' users the 'trouble' of semicolons will only hurt in the long run.
@kquinn - SQL Server Management Studio does not require a delimiter.
You're still crazy for demanding this. The alternative is simple and easy. Your demand for no delimiters is difficult, error-prone, and fragile.
@kquinn. Respectfully, I disagree. SQL Server Management Studio does not require delimiters and it is not an error-prone or fragile application. The idea is to design software for how people will tend to behave, not how I want them to behave. Look, I'm not saying this is easy, and I'm certainly not going to try to write my own SQL parser in PHP, but it's still worth trying to find a way.
|
-1

You could parse it yourself I suppose. Look for the keywords SELECT, DELETE, UPDATE, INSERT, EXEC, etc.

As you parse, if you encounter a "(" increment a counter: nest_level++

If you encounter a ")" decrement nest_level--

Then when you come across a keyword, and nest_level == 0, then you've come to the next statement.

You'll also have to handle cases like

 INSERT ...
 SELECT ....

So for an INSERT you would have to look for either SELECT or VALUES...

And no doubt other cases.

Agree with kquinn you should just require the semicolon. I don't think there's anything "uncool" about that.

1 Comment

Yah, these are all the traps that I caught in trying to write my own algorithm.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.