32

An item is a comma delimited list of one or more strings of numbers or characters e.g.

"12"
"abc"
"12,abc,3"

I'm trying to match a bracketed list of zero or more items in Java e.g.

""
"(12)"
"(abc,12)"
"(abc,12),(30,asdf)"
"(qqq,pp),(abc,12),(30,asdf,2),"

which should return the following matching groups respectively for the last example

qqq,pp
abc,12
30,asdf,2

I've come up with the following (incorrect)pattern

\((.+?)\)(?:,\((.+?)\))*

which matches only the following for the last example

qqq,pp
30,asdf,2

Tips? Thanks

2
  • 2
    Could you just split the string on "),(" and remove the remaining brackets to achieve your result? Commented Aug 4, 2011 at 10:11
  • Definitely want Matcher.find(). Commented Jul 14, 2014 at 3:31

5 Answers 5

45

That's right. You can't have a "variable" number of capturing groups in a Java regular expression. Your Pattern has two groups:

\((.+?)\)(?:,\((.+?)\))*
  |___|        |___|
 group 1      group 2

Each group will contain the content of the last match for that group. I.e., abc,12 will get overridden by 30,asdf,2.

Related question:

The solution is to use one expression (something like \((.+?)\)) and use matcher.find to iterate over the matches.

Sign up to request clarification or add additional context in comments.

3 Comments

Argh ok thanks I didn't know that, now I gotta figure out an alternative
Yep. It's annoying for sure. .net has the feature (as seen in the question / answer I linked to.)
@David, In case you missed it, he already gave you a good alternative :)
1

You can use regular expression like ([^,]+) in loop or just str.split(",") to get all elements at once. This version: str.split("\\s*,\\s*") even allows spaces.

Comments

1

(^|\s+)(\S*)(($|\s+)\2)+ with ignore case option /i

She left LEft leFT now

example here - https://regex101.com/r/FEmXui/2

Match 1
Full match  3-23    ` left LEft leFT LEFT`
Group 1.    3-4 ` `
Group 2.    4-8 `left`
Group 3.    18-23   ` LEFT`
Group 4.    18-19   ` `

Comments

1

Using an ANTLR grammar can solve this problem. This is really beyond the reasonable capabilities of RegExp, although I believe some newer versions of Microsoft's implementation in .Net support this behavior. See this other SO question. If you're stuck with everything but .Net your best option is going to be a parser-generator (you don't have to use ANTLR, that's just my personal preference). Going through the ANTLR4 GitHub page can help get someone started on matching on more complex expressions with things like repeating match groups. Another option that doesn't require a whole lot of new learning is to tokenize the string input that you're wanting to match on and pull out the pieces that you want, but this can prove to be extremely messy and create nightmarish chunks of parsing code that are better-suited to a generated parser.

Comments

-1

This may be the solution :

package com.drl.fw.sch;

import java.util.regex.Pattern;

public class AngularJSMatcher extends SimpleStringMatcher  {

Matcher delegate;


public AngularJSMatcher(String lookFor){
    super(lookFor);
    // ng-repeat 
    int ind = lookFor.indexOf('-');
    if(ind >= 0 ){
        StringBuilder sb = new StringBuilder();
        boolean first = true;
        for (String s : lookFor.split("-")){
            if(first){
                sb.append(s);
                first = false;
            }else{
                if(s.length() >1){
                    sb.append(s.substring(0,1).toUpperCase());
                    sb.append(s.substring(1));

                }else{
                    sb.append(s.toUpperCase());
                }
            }
        }
        delegate = new SimpleStringMatcher(sb.toString());
    }else {
        String words[] = lookFor.split("(?<!(^|[A-Z]))(?=[A-Z])|(?<!^)(?=[A-Z][a-z])");
        if(words.length > 1 ){
            StringBuilder sb = new StringBuilder();
            for (int i=0;i < words.length;i++) {
                sb.append(words[i].toLowerCase());
                if(i < words.length-1) sb.append("-");
            }
            delegate = new SimpleStringMatcher(sb.toString());
        }

    }

}

@Override
public boolean match(String in) {
    if(super.match(in)) return true;
    if(delegate != null && delegate.match(in)) return true;

    return false;
}

public static void main(String[] args){
    String lookfor="ngRepeatStart";

    Matcher matcher = new AngularJSMatcher(lookfor);

    System.out.println(matcher.match( "<header ng-repeat-start=\"item in items\">"));
    System.out.println(matcher.match( "var ngRepeatStart=\"item in items\">"));

}

}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.