2

I want to get the function arguments of string.

sample( 5*5 ) euros

This works correctly with:

([^\s\)]+)\(([^\)]+)\)

Demo here.

The problem is when I put another function inside the argument:

sample( decimal( 5*5 ) ) euros

With only a function this works with:

([^\s\)]+)\((.+)\)

Demo here.

But with two functions or more I can't get the function arguments:

sample( decimal( 5*5 ) ) toString(euros)

How can I get the function arguments with a regular expression?.

5
  • 2
    What is the regex flavor (regex library, programming language, tool)? Check out ([^\s)]+)(\(((?>[^()]++|(?1))*)\)) - but if you are making a parser, I guess you need no regex. Commented Jul 18, 2016 at 6:42
  • I am making a parser with PHP: preg_match('/([^\s\)]+)\((.+)\)/', 'sample( decimal( 5*5 ) ) toString(euros)', $matches) Commented Jul 18, 2016 at 6:43
  • 1
    I'm sure @Wiktor can give you a clever regex for this, but if you really expect nested function calls to an arbitrary depth, you should consider using a parser. Commented Jul 18, 2016 at 6:43
  • 2
    Regular expressions are not able to deal with nested structures. You will have to use a different tool. There are configurable grammar parsers that can deal with these problems (for example pegjs.org for JavaScript). Commented Jul 18, 2016 at 6:44
  • 1
    The basic algorithm for your parser should be: Eat opening parentheses ( pushing them onto a stack along with whatever content comes after them. When you hit the first ) you can pop the contents, and this is a function argument or set of arguments. Commented Jul 18, 2016 at 6:46

2 Answers 2

2

If you are writing a parser you can do without a regex. From the educational point of view, in PHP PCRE regex, you can use recursion and subroutine calls.

Have a look at

(?<name>[^\s()]+)(\((?<body>(?>[^()]++|(?2))*)\))

See the regex demo

Group "name" will contain the function name and "body" group will hold what is inside the matching parentheses.

Note you need to add both ( and ) to the negated character class (?<funcion>[^\s()]+) because in case you have sample(decimal(3*3)) this group will grab the substring up to the ) (sample(decimal). Thus, you need to exclude both ( and ).

The (\((?<body>(?>[^()]++|(?2))*)\)) part is a capture group (with ID=2) that can be recursed (i.e. "repeated", "expanded" many times) with a subroutine call (?2).

It matches

  • \( - an open round bracket
  • (?<body>(?>[^()]++|(?2))*) - Group "body" that matches zero or more sequences of:
    • [^()]++ - 1+ characters other than ( and ) or
    • (?2) - the whole \((?<body>(?>[^()]++|(?2))*)\) subpattern
  • \) - a closing parenthesis

The (?2) subroutine call necessity (as compared to recursion with (?R)) is dictated by the fact that we need to repeat/recurse a part of the pattern.

Since Group 2 is a "technical" capture group, it might be a good idea to use named capture groups for those parts we want to really use.

Sign up to request clarification or add additional context in comments.

4 Comments

Just corrected - (?1) > (?2), fixed everywhere in the answer.
Now perfect!. Thanks very much. It is a very simple parser with only four functions. Thanks for explain it.
Is It possible with sample(decimal( 5*5 ))? Without spaces doesn't work.Check it
You need to fix the first capture group: (?<funcion>[^\s()]+)(\((?<argumento>(?>[^()]++|(?2))*)\)). I updated the answer with an explanation for the first capture group.
1

Use a look ahead that specifies the next bracket char (if any) is an open one, and use a relucant quantifier.

This should work:

([^\s\)]+)\((.+?)\)(?=[^()]*(\(|$))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.