8

Given a string that may contain one or more SQL statements separated by ";", for example:

String sql = "select * from table1 where col1 = 'abc;de'; select * from table2;";

I need to obtain in an array of strings the statements:

array[0] = "select * from table1 where col1 = 'abc;de';"
array[1] = "select * from table2;"

Note that semicolons could appear between apostrophes.

For example, using regex:

String regex = "???";  // <--- I can't figure out this one
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(sql);

What would be the regex to accomplish this?

2 Answers 2

16

You can try the following regular expression:

\s*;\s*(?=([^']*'[^']*')*[^']*$)

Visual Regex

Here is the example:

public static void main(String[] args) {
    String input = "select * from table1 where col1 = 'abc;de'; select * from table2;";

    System.out.println(java.util.Arrays.toString(
        input.split("\\s*;\\s*(?=([^']*'[^']*')*[^']*$)")
    )); // prints "[select * from table1 where col1 = 'abc;de', select * from table2]"

}

Explanation of the regular expression:

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  ;                        ';'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    (                        group and capture to \1 (0 or more times
                             (matching the most amount possible)):
--------------------------------------------------------------------------------
      [^']*                    any character except: ''' (0 or more
                               times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
      '                        '\''
--------------------------------------------------------------------------------
      [^']*                    any character except: ''' (0 or more
                               times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
      '                        '\''
--------------------------------------------------------------------------------
    )*                       end of \1 (NOTE: because you are using a
                             quantifier on this capture, only the
                             LAST repetition of the captured pattern
                             will be stored in \1)
--------------------------------------------------------------------------------
    [^']*                    any character except: ''' (0 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    $                        before an optional \n, and the end of
                             the string
--------------------------------------------------------------------------------
  )                        end of look-ahead
Sign up to request clarification or add additional context in comments.

1 Comment

what if the select statement is surrounded by single quote and semicolon is surrounded by double quote? I mean how to add semicolon within double quote condition to this regex?
1

I come in a while after the battle, but if this can help someone...

The most simple way I found out is to rely on IBatis migration tool that has a batch file processing tool which internally splits the file into its statements.

Internally, in the ScriptRunner class, they use :

private static final Pattern DELIMITER_PATTERN = Pattern.compile("^\\s*((--)|(//))?\\s*(//)?\\s*@DELIMITER\\s+([^\\s]+)", 2);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.