I'm trying to match and extract all the table names and columns from any given MySQL query.
The given query is unquoted (back ticks) and According to MySQL the naming rules are:
Permitted characters in unquoted identifiers:
ASCII: [0-9,a-z,A-Z$_] (basic Latin letters, digits 0-9, dollar, underscore)
Extended: U+0080 .. U+FFFF
For a test case I'm using this query:
SELECT users.id , users.first_name ,users.last_name, roles.role,avatars.img_name,timezone.gmt_offset
FROM users
LEFT JOIN roles ON users.role = roles.id
LEFT JOIN avatars ON users.avatar=avatars.id
LEFT JOIN country ON users.country=country.country_code
LEFT JOIN timezone ON users.timezone = timezone.id
WHERE (users.id >=2 AND users.id <=4 ) OR (roles.role LIKE 'us%')
OR (roles.role = 'user(complex.sit )' && (timezone.gmt_offset >=7200
OR users.last_name ='tryme'))
LIMIT 0 , 30
My Regex so far:
%[ .(),]?([a-z0-9_$]{2,})[ .(),]?(?!AND|OR|LIKE|SELECT|JOIN|ON)%i
I'm planning on capturing the group and replacing it with the match wrap with backticks The problem is That I cant filter out the reserved words that are being matched too (SELECT, JOIN....), I have tried adding a negative lookahead but it doesn't work.
The second problem is with values like in the example = 'user(complex.sit )' i dont want it to match those two words (complex sit).
Any suggestions?