0

I would like to parse the following string:

this OR (this AND (this OR (that AND this))) OR this

Into this:

  [
    "this",
    "OR",
    [
      "this",
      "AND",
      [
        "this",
        "OR",
        [
          "that",
          "AND",
          "this"
        ]
      ]
    ],
    "OR",
    "this"
  ]

However, I also need to support things like "this is a text" OR this which parses as ["this is a text", "OR", "this"] where the quotes represent the whole string which cancels the space-breaking between words.

But this is not an issue since it's not recursive like brackets. This can be done with regex.

After some research, this can not be done with ES regex, but it can be done with PHP's regex. Is there a way to do it with JavaScript's regex? If not, what are the best alternatives to use, maybe a prebuilt library for this?

5
  • How would you do that with PHP regex (I'm not a php guy)? Because theoretically, a regex cannot ensure that every opening bracket is matched with the correct closing bracket, if the depth of the expression tree is not limited. You would need a pushdown automata for that. And even if the depth is limited, the regex would be quite complicated. Of course regex in various languages do have some extensions to the pure regular expression concept how it is defined in Computer Science ... Commented Jun 28, 2021 at 10:34
  • @derpirscher Hello, sorry I wasn't really clear, I didn't mean recursively, because it doesn't do it. But it knows how to find the bracket that closes itself without taking any inner bracket: (\(([^()]|(?R))*\)). As you see it uses Subroutine token, it is not supported in other languages. But if you do know how to solve this like the PHP regex but in javascript, that'd solve all of my issues because I can recursively run the regex Commented Jun 28, 2021 at 13:35
  • tsk tsk for sneaking in a library recommendation request at the end of your question. but here you go: See Parsing in JavaScript: Tools and Libraries Commented Jun 28, 2021 at 19:22
  • @Ben Beri, "But this is not an issue since it's not recursive like brackets, this can be done with regex." - How It could be done with regex ? Commented Jul 3, 2021 at 17:00
  • @Nur Grouping all values by quotes is pretty simple task for regex, since it's not recursive, unlike brackets where you can have cases like ( ( ( ) ) ). And the regex for quotes I used was (["'])(?:(?=(\\?))\2.)*?\1 Commented Jul 3, 2021 at 17:24

1 Answer 1

5

String is just a stream of character.

  • At first we need a lexical analyzer. It take stream of character and generate token. Example:

    Input  = `"Hello world" (!)`
    Output =  Lex(Input) -> ["Hello world", "(" , "!", ")"]
    
  • We can use a recursive function. where we can create a stack on top and return new array.

    Tokens = ["Hello world", "(" , "!", ")"]
    Result = [ "Hello world", ["!"] ]
    

let input = `this OR (this AND (this OR (that AND this))) "this is a text" OR this`

function* Lex(characters) {
    let token = "", char;
    while (char = characters.shift()) switch (char) {
        case '(': case ')':
            if (token) { yield token; token = '' }
            yield char
            break
        case '"':
            let index = characters.indexOf(char);
            token = characters.splice(0, index).join('');
        case ' ':
            if (token) yield token
            token = ''
            break
        default: token += char
    }
}

function fnName(tokens) {
    let result = [], item;
    while (item = tokens.shift()) {
        if (item == ')') return result
        result.push(item == '(' ? fnName(tokens) : item)
    }
    return result
}

let tokens = [...Lex(input.split(''))];
console.log(fnName(tokens))

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.