2

I have searched high and low and am really struggling to find the appropriate REGEX that would help me retrieve what I want. Assume I have the following query string:

SELECT col, col2 AS c2, col3, col * col2 calc FROM...

I want a REGEX that will pull out everything between commas where there is a space or an "AS" that proceeds a column alias. It should also exclude the SELECT and FROM. With the example above, I would want the following matches:

col2 AS c2
col * col2 calc

Essentially, this should remove col and col3 because they don't have aliases. I have tried variations of the following but have been unsuccessful (Note: $colname contains the alias):

/(SELECT|,)([^,]*{$colname}[^,]*)(FROM|,)/

Currently this only returns:

, col2 AS c2,
col2 AS c2
, col2 AS c2,
col2 AS c2

I feel I am close but I need some guidance. Any suggestions are appreciated.

12
  • 1
    Maybe (?:\G(?!\A),|SELECT)(?:\s+\S+,)*\s*\K\S+\s*(?:\*|AS)\s*(?:(?!FROM)[^,])*(?!\w) will work. Commented Feb 24, 2018 at 17:16
  • Thank you for your time. That removed all the , col2 AS c2 but didn't get me col * col2 calc. Commented Feb 24, 2018 at 17:18
  • 2
    Regex are the wrong tool to parse anything. Use a SQL parser, this'll be much easier and more reliable. Google finds e.g. github.com/greenlion/PHP-SQL-Parser Commented Feb 24, 2018 at 17:36
  • 1
    preg_match only returns the first match. Commented Feb 24, 2018 at 17:36
  • 1
    @Dustin, use [\p{S}\p{P}] to match any punctuation/symbol. Or [[:punct:]] Commented Feb 24, 2018 at 18:18

1 Answer 1

2

It is next to impossible to handle SQL string parsing with one regex, you should look for a more generic solution like PHP-SQL-Parser.

The regex I have come up with is an ad-hoc work around that can be further precised, adjusted, etc. to fit the needs of a specific project:

(?:\G(?!\A),|SELECT)(?:\s+[^\s,]+(?:\([^()]*\))?,)*\s*\K[^\s,]+\s*(?:(?![()',])[[:punct:]]|AS)\s*(?:(?!FROM)[^,])*(?!\w)

See the regex demo.

Here is what it does:

  • (?:\G(?!\A),|SELECT) - the end of the previous successful match and a comma after (\G(?!\A),) or (|) a literal SELECT substring
  • (?:\s+[^\s,]+(?:\([^()]*\))?,)* - zero or more sequences ((?:...)*) of:
    • \s+ - 1+ whitespaces
    • [^\s,]+ - 1+ chars other than whitespace and a comma
    • (?:\([^()]*\))? - an optional substring like (...) (a (, then 0+ chars other than ( and ), and then a ))
    • , - a comma
  • \s* - 0+ whitespace chars
  • \K - a match reset operator discarding the text matched so far (so, SELECT abc, Contains(...), will get omitted from the match, but will be required for the match)
  • [^\s,]+ - 1 or more chars other than whitespace and ,
  • \s* - 0+ whitespace chars
  • (?:(?![()',])[[:punct:]]|AS) - either
    • (?![()',])[[:punct:]] - any punctuation or symbol char but a (, ), ' and ,
    • | - or
    • AS - a literal AS substring
  • \s* - 0+ whitespaces
  • (?:(?!FROM)[^,])* - zero or more chars other than a comma not starting a FROM char sequence (it is a tempered greedy token)
  • (?!\w) - not followed with a word char.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.