0

Please consider this string:

string mySelects = @"SELECT * FROM TABLE1 WHERE 1=1
SELECT T.* 
FROM (SELECT * FROM TABLE2 WHERE 2=2) T
ORDER BY FOO
SELECT FIELD1, FIELD3
FROM TABLE3
GROUP BY FIELD";

Is there any regular expression to return a MatchCollection with these three Selects in each result?

Thanks in advance!

3
  • 7
    For such analysis you better use a SQL parser... regex is not the right tool... Commented Oct 29, 2013 at 15:27
  • 3
    Regexes are good for parsing languages without context, sql is more complicated. You will not be able to find a regex which gives you a good result in all cases. Commented Oct 29, 2013 at 15:29
  • I think it might be quite simple: give me all 'Select .*' expressions before another 'select' or end of string, ignoring 'select .*' between brackets. Any help? Commented Oct 29, 2013 at 16:26

2 Answers 2

2

There is no regex like that, and it is not possible to write one: the regex programming model is not powerful enough to deal with counting. It is not powerful enough even to balance parentheses in an expression - something that would be required to capture the third select in your example. The fact that .NET's regex engine supports balancing groups is not going to help you much: regex is hopelessly inadequate for the task at hand.

You need something more powerful than the regex - preferably, a complete SQL parser. I have tried this parser (registration is required), it worked fine on many queries. It also has a source in case you wish to adjust it to your needs.

Sign up to request clarification or add additional context in comments.

3 Comments

-1 Because of balancing groups. It's one of the reasons why I downvoted the other answer; you guys are getting tied up in the formal definition of regular languages when C# regexes aren't at all regular.
@NullUserException +1 for dasblinkenlight. The whole question is regarding parsing SQL and no regular expression engine is powerful enough to do so. I'm certain, that OP doesn't want to dwell into formal definitions of regular expression and grammars, so we included just enough theory for him to understand, why what is he trying to do is impossible.
@dasblinkenlight To be honest, I have not. But I'm just pointing out there are features in C#'s regexes that make it formally non-regular. BTW I've removed the downvote.
1

There is none.

Mathematically speaking, a regular expression describes some set of strings (these, which match this regular expression). Similarly a grammar describes a set of strings (which match this grammar). But grammars are a superset over regular expressions, because every regular expression can be described by a grammar, but unfortunately not the other way around.

SQL is a language, which is described by some grammar. But this grammar is way too complicated to be described by a regular expression. You need some more mathematical firepower to process this language.

The solution is to seek for a ready-made SQL parser or write one using tools like ANTLR.

7 Comments

@dasblinkenlight What do you mean by a "spite vote"?
@dasblinkenlight I see. Tough luck :)
-1 for a couple of reasons. First because "a grammar is a superset of a regular expression" is a statement that doesn't make any sense. A formal grammar describes a formal language. This language can be regular. Also, no modern regex dialect is truly regular. C# regexes certainly aren't regular.
Note this isn't an endorsement for using regexes for this job, they aren't a great tool for this if you're starting from scratch (though you should keep in mind ANTLR's lexer uses regexes). I'm merely pointing out the answer contains technically inaccurate statements. If you fix those I'll remove the downvote (cc @dasblinkenlight)
@NullUserException I rephrased the answer to make it more clear. The overall point is, that no regular expression engine (even with some extensions) is powerful enough to process a SQL grammar. Note, that even previously I noted, that there are regular grammars (these, which match regular expressions). BTW, Please consider writing comments for downvotes in the future - if there are some uncertainties, we will be able to fix our answers way faster.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.