0

Apologies, if this is duplicate. I am matching a sql string using regex in java. In my query i may or may not have where clause.

I am using following regex

\bselect\b\s(.*)\s\bfrom\b\s(.+)\s(?:where)?\s(?:(.*))

which is working fine for

select a,b from tab1 where a=b

and not matching for

select a,b from tab1

if i added two additional spaces its matching. It is clear that it is due to two \s that i have used. But, i want to make those optional as well.

Please help me. I could not understand the other posts on this topic in stackoverflow

4
  • 1
    Apologies for being duplicate are pointless. Search first. If it's dupe, don't post. So far it appears this isn't, but you should know that that's the process you should go through before posting. Commented Mar 6, 2015 at 13:06
  • You can test regex online over site like regex101.com for different input combination Commented Mar 6, 2015 at 13:08
  • 1
    Keep in mind regex can only match simple SQL. If for example you have embedded queries, it will fail without much more fancy syntax. Commented Mar 6, 2015 at 13:23
  • Hi, I have tried lot of combinations. The tricky part is to work for both with and without where clause. The solution i got finally needs extra two spaces when there is no where. But, want better one which takes care of optional spaces also Commented Mar 6, 2015 at 17:07

3 Answers 3

1

Put the spaces inside the non capturing groups:

\bselect\b\s(.*)\s\bfrom\b\s(\w+)(?:\swhere\s(.*))?
Sign up to request clarification or add additional context in comments.

5 Comments

Hi, this wil not work for queries with where clause. I have tried this, but it works only for query without where.
@Srini: That's strange, it works for me. Could you explain which case doesn't work?
Please check for quey - select col1 from tab1 where a=2. group1 is coming out as tab1 and group2 as tab1 where a=2.
@Srini: Replace the .+ for table name with \w+. See my edit.
That worked. Thanks. What difference that made in terms of parsing.? just curious to know.
1

Not answering the question directly, but using a regex here is the wrong choice. SQL is a grammar, so any regex you come up with that can manage all cases will rapidly become far too complex to manage.

You should look at Antlr, which will allow you to define the grammar and act on whatever bits of it you like from within Java. There's even a pre-built grammar for SQLite which will allow you to get started very quickly.

2 Comments

Thanks very much for the details. I need to prepare a full fledged solution for SQL parsing any way. So, I will check Antlr. I knew regex will not be a solution. But, i tried to parse a simle query using regex and got this error. So just want to check in stackoverflow for the whitespace problem. If you have any more details on SQL Grammer preparation, could you please provide them.?
Using antlr is really easy. As mentioned, there is a pre-built grammar for SQLite and if you need to support a different variant then you can do so taking the SQLite grammar as a base. The homepage of Antlr gives you a quickstart and it there's not a lot more to it than that. Bear in mind that it is only a parser so you will have to provide the backend model for the SQL statements, but once you have that then you just need to extend the base listener class provided by antlr and build on it as you support more of the feature set.
0

You don't need to check that a match is at the end of a word and that it is followed by a space. + will match at least one space, so that allows extra spaces to be matched. \S matches non-space so that is preferable to . which will match spaces.

\bselect\s+(\S+)\s+from\s+(\S+)\s*(?:where\s+(\S+))?

1 Comment

@DavidKnipe Thanks for the edit. I modified my answer so that it actually works.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.