2

I have a string like below.
Sum(Height(In))

And i need to split above string like this.
Sum
Height(In)

I have tried following regex. But i have no luck.

/[ .:;?!~,`"&|^\((.*)\)$<>{}\[\]\r\n/\\]+/

Is there any way to achieve this?

Thanks in advance.

5
  • 4
    Regular expressions are not good at dealing with balanced constructs like nested parentheses. Commented Mar 17, 2021 at 19:19
  • const [_, one, two] = text.match(/^([^()]+)\((.*)\)$/)? Commented Mar 17, 2021 at 19:20
  • 1
    How much nesting is possible? Are arbitrary expressions possible? As Barmar mentions, you'll probably need a stack or parser to handle the recursion and any additional complexity you expect to handle. Commented Mar 17, 2021 at 19:20
  • Another approach: const [_, one, two] = text.match(/(\w+)\((\w+\([^()]*\))\)/) Commented Mar 17, 2021 at 19:25
  • First one working as expected. Thanks @WiktorStribiżew Commented Mar 17, 2021 at 19:33

2 Answers 2

2

You can match all up to the first ( and then all between that first ( and the ) that is at the end of the string, and use

const [_, one, two] = "Sum(Height(In))".match(/^([^()]+)\((.*)\)$/);
console.log(`The first value is: ${one}, the second is ${two}`);

See the regex demo. If the last ) is not at the end of string you can remove the $ end of string anchor. If there can be line breaks inside, replace .* with [\w\W]*.

Regex details:

  • ^ - start of string
  • ([^()]+) - Group 1: one or more chars other than ( and )
  • \( - a ( char
  • (.*) - Group 2: any zero or more chars other than line break chars, as many as possible (* is greedy)
  • \) - a ) char
  • $ - end of string.
Sign up to request clarification or add additional context in comments.

Comments

1

You can do it but in a limited way. You need to fix the maximum number of parenthesis (the number of levels) to allow, as the unbounded case defines a language that is not regular. Regular expresions can accept regular languages (languages parseable by a limited grammar, called a regular grammar, or a finite state automaton) while the unbounded level parenthesis languages require a context free grammar (and the algorithm is normally implemented as a stack based automaton).

The solution pointed to by the response by wiktor Sribizew would be valid if you are going to accept any expression that can have unbalanced parenthesis (more open paraenthesis than closed, or viceversa) If you want to close exactly after the parenthesis that matches the initial one, then you need a context free grammar parser. See below for an explanation of why.

In order to get the regular expression, you must express what can form the most internal level (at the highest nesting level) regular expression, something that cannot allow an open or a closed parenthesis (for this explanation, I will end at three levels of parenthesis, but you can expand it to more, the only requirement is that you must stop at some level, and have enough patience to do it, so I'm doing it at three levels only) Below is a regular expression that allows anything but a parenthesis:

[^()]*

Let me call this expression L0. to allow a pair (or a sequence) of parenthesis that match... we can have a second regexp L1 formed as shown (the notation {L0} ---I'l put a pair of parenthesis around the braces to allow you to better see the operators in the regular expression--- means the regexp above):

{L0} (\( {L0} \) {L0} )*

which means a secuence of L0 expressions interspersed with L0 expressions surrounded by a pair of parenthesis at each side. I'll expand {L0} only in this case to illustrate how the regular expression gets more and more complex at each stage (you can build this regular expression using a program and you'll get a very complex regular expression that parses a bounded number of nested parenthesis, very efficiently)

[^()]* (\( [^()]* \) [^()]* )*

(I left the spaces around the braces for readability purposes, but to use the regexp you need to eliminate all the embedded spaces in it)

This regular expression can be called L1 and will serve us to build the regular expression of level 2. This will be formed by the following sequence:

{L1} (\( {L1} \) {L1} )*

where each of the {L1} is expanded with the regular expression we got above. This expression will be called L2.

After this, a pattern is seen, for a maximum of n levels, you will have to repeat this process, substituting the n-1 level expression on the Ln which is:

{Ln-1} (\( {Ln-1} \) {Ln-1} )*

and this regular expression wil be called Ln. The total length of the regular expression multiplies by at least three times at each level of nesting, so you can expect that for e.g. six levels of parenthesis nesting your regexp will have around 6*3^(n) or aprox 4375 characters. If you have a computer, you can use it to compute the regular expression, you can compile it and see how efficient it is (in one pass, with checking just one character at a time, you'll get if a parenthesized upto six levels of parenthesis matches)

To get above a few levels imposes a serious problem to the regexp and a context free grammar parser has to be used. It's common to parse JSON data structures that have over 10 levels of parenthesis and this would require a regexp of around 6*3^10 (or around 360k characters long) and this makes this approach non practical.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.