17

I would like to match a string within parentheses like:

(i, j, k(1))
^^^^^^^^^^^^

The string can contain closed parentheses too. How to match it with regular expression in Java without writing a parser, since this is a small part of my project. Thanks!

Edit:

I want to search out a string block and find something like u(i, j, k), u(i, j, k(1)) or just u(<anything within this paired parens>), and replace them to __u%array(i, j, k) and __u%array(i, j, k(1)) for my Fortran translating application.

8
  • is there a maximum "depth" of paranthesis, or can you have any depth? Commented Jul 20, 2013 at 5:26
  • 1
    It doesn't sound like you need a very sophisticated parser... Commented Jul 20, 2013 at 5:27
  • You can't do this with regexes, at least not with Java, since regexes are not recursive. PCRE can do this, though. But you should use/write a parser. For instance, you can try parboiled. Commented Jul 20, 2013 at 5:31
  • @radai I do not have a depth limit, but if that is needed, I can accept it. Commented Jul 20, 2013 at 5:31
  • 2
    @LiDong This could be a XY problem. What do you want, exactly? Tell if the string is well-formed? The data in it? What? Commented Jul 20, 2013 at 5:34

2 Answers 2

33

As I said, contrary to popular belief (don't believe everything people say) matching nested brackets is possible with regex.

The downside of using it is that you can only up to a fixed level of nesting. And for every additional level you wish to support, your regex will be bigger and bigger.

But don't take my word for it. Let me show you. The regex:

\([^()]*\)

Matches one level. For up to two levels, you'd need:

\(([^()]*|\([^()]*\))*\)

And so on. To keep adding levels, all you have to do is change the middle (second) [^()]* part to ([^()]*|\([^()]*\))* (check three levels here). As I said, it will get bigger and bigger.

Your problem:

For your case, two levels may be enough. So the Java code for it would be:

String fortranCode = "code code u(i, j, k) code code code code u(i, j, k(1)) code code code u(i, j, k(m(2))) should match this last 'u', but it doesnt.";
String regex = "(\\w+)(\\(([^()]*|\\([^()]*\\))*\\))"; // (\w+)(\(([^()]*|\([^()]*\))*\))
System.out.println(fortranCode.replaceAll(regex, "__$1%array$2"));

Input:

code code u(i, j, k) code code code code u(i, j, k(1)) code code code u(i, j, k(m(2))) should match this last 'u', but it doesnt.

Output:

code code __u%array(i, j, k) code code code code __u%array(i, j, k(1)) code code code u(i, j, __k%array(m(2))) should match this last 'u', but it doesnt.

Bottom line:

In the general case, parsers will do a better job - that's why people get so pissy about it. But for simple applications, regexes can pretty much be enough.

Note: Some flavors of regex support the nesting operator R (Java doesn't, PCRE engines like PHP and Perl do), which allows you to nest arbitrary number of levels. With them, you could do: \(([^()]|(?R))*\).

Sign up to request clarification or add additional context in comments.

7 Comments

@acdjunior Sir, can you please explain why this regex: \((?:[^()]|(?:\([^()]*\)))*\) would not work for any depth?
@AhmedAkhtar Because it matches only two levels. Roughly speaking, it only matches two because after the second bracket is opened, it "ignores" any other bracket opening, meaning the first closing bracket it founds, it considers it is closing the second opened (not the last opened). Example: aaa 1( aaa 2( aaa 3( aaa 4) aaa 5) aaa 6) aaa.., in this case, the two-level regex interprets 4) as closing 2(, not 3( as you'd expect.
Sir please try to answer to comments earlier, it has been almost a year since I posted this comment and I really don't remember the context in which I asked the question. Thanks anyways btw.
@AhmedAkhtar Yes, of course, sorry about the delay. Sometimes we don't answer right away and end up forgetting it altogether. Will try to be quicker next time, anyway. Cheers!
I think this regex does not work for nested patterns like (abc (de) (fg) hi ). What modification can be done to the regex to support this ?
|
1

Separate your job. Have the regex be:

([a-z]+)\((.*)\)

The first group will contain the identifier, the second the parameters. Then proceeed as such:

private static final Pattern PATTERN = Pattern.compile("([a-z]+)\\((.*)\\)");

// ...

final Matcher m = Pattern.matcher(input);

if (!m.matches())
    // No match! Deal with it.

// If match, then:

final String identifier = m.group(1);
final String params = m.group(2);

// Test if there is a paren
params.indexOf('(') != -1;

Replace [a-z]+ with whatever an identifier can be in Fortran.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.