0

EDIT: people closed my question by lack of information/context, so I will give more details, but what I really need is after the line. If more details are needed, please comment the question specifying what it is.

I have a software building pages and datalists over hundreds of queries. These queries are stored in a database as a single string. I need to replicate those datalists in other software and I can't do it manually because they are a lot. So, this second software has a shellscript method, which accept a JavaScript file as input, where I can pass a JSON with a specific formatting and the software build the pages and datalists automatically. So, I made my JavaScript code, which starts splitting the query by SQL keywords:

["SELECT COL_A, COL_B...", "FROM TABLE_A, TABLE_B...", "WHERE..."]

The first two parts, the SELECT and FROM parts of the query, are treated and "inside" my JSON already.


My problem is the third part, the WHERE clauses. Because they can be empty, or have one clause, two, three, N and have parenthesis and my JSON can't accept all that complexity in a single row, so, I need to split this into an array where each individual clause, each and/or and each parenthesis are a node - except the parenthesis after IN, as the IN function requires (), so I can use this array later to finish my JSON.

Examples of possible strings:

str = null
str = "COL_A = 'foo'"
str = "COL_A like '%foo%' and COL_B > 1"
str = "(COL_A is not null or COL_B is null) and COL_C <> 'foo'"
str = "(COL_A = SYSDATE and COL_B in (1, 2, 3)) or (COL_C not like '%[foo$foo]%' and (COL_D = 1 or COL_E = 2))"

Desired output of each previous example:

array = []  //size 0
array = ["COL_A = 'foo'"]  //size 1
array = ["COL_A like '%foo%'", "and", "COL_B > 1"]  //size 3
array = ["(", "COL_A is not null", "or", "COL_B is null", ")", "and", "COL_C <> 'foo'"]  //size 7
array = ["(", "COL_A = SYSDATE", "and", "COL_B in (1, 2, 3)", ")", "or", "(", "COL_C not like '%[foo$foo]%'", "and", "(", "COL_D = 1", "or", "COL_E = 2", ")", ")"]  //size 15

My biggest problem is, of course, how to ignore the correct parenthesis pairs of the IN clause. So my knowledge of js and regex isn't good enough to accomplish that. It is possible to do that with one single call to String.split() or other js native function? Maybe I have to build my own function? recursive?

1
  • Sounds like an XY problem. Why do you need to parse queries in js? Commented Dec 19, 2019 at 19:56

1 Answer 1

2

What you need is a parser. Just forget about writing one god regex for matching complex patterns like this. Split it into tokens and process the tokens to get your desired output. You can find many descriptions about how to write parsers. I read them too, most of them is some abstract mathematical bullshit I don't understand. Obviously I lack the knowledge to understand pseudocode and advanced math or they lack the means to explain it. Does not matter, because all of that is not necessary to do this. Just use some common sense about where you can split this. You can split it at whitespace, comma, words like "and", "in", "not", "like", etc... so it is not something hard. After that you can go through it, use syntactical rules like you have to have as many opening parentheses as many closing ones, etc. You can download an SQL parser too if you want to. I just don't understand why would someone need to parse SQL, because we usually build it and we don't like it as an user input...

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.