1

I have a string containing n substrings in the following format which I want to match:

{varName:param1, param2, param2}

Requirements are as follows:

  1. Only the varName (inside the curly brackets) is mandatory
  2. No limit on the number of parameters
  3. No restrictions on whitespace inside curly brackets apart from var and param names which must not contain whitespace

I would like to be able to capture the varName and each of the parameters separately.

I have come up with a regex that is nearly there, but not quite. Any help would be appreciated.

5
  • 1
    You say you've got a regex which is "nearly there". Perhaps you could post what you've managed so far? Commented Feb 8, 2011 at 0:54
  • Problem with this is that it matches the varName (group1) the first param (group2) and the last param (group3) only Commented Feb 8, 2011 at 1:07
  • \{([a-zA-Z]+)(?:\s*:\s*([^,\s]+))?(?:\s*,\s*([^,\s]+))*\s*\} Commented Feb 8, 2011 at 1:30
  • As every regex has a fixed number of groups, you can't use it to capture an unlimited number of substrings. You need some sort of loop or split. Commented Feb 8, 2011 at 1:31
  • Well that's that then :-) thx alot. Commented Feb 8, 2011 at 1:36

6 Answers 6

2

I'm wondering whether it would be easier to simply use String.split() judiciously, rather than battle with regexps for the above. The delimiters (colons/whitespace/commas) seem well-defined.

Sign up to request clarification or add additional context in comments.

2 Comments

perhaps, though I do need to use a regex to find the whole patterns in the surrounding string, so thought I might as well match the inner parts at the same time
Agreed, regex really aren't the solution for this problem. Unreadable oneliners (and are there any other regexs?) are horrible to maintain and the syntax is simple enough to be easily parsed in a few lines.
1
String s = "blah blah\n{varName:param1, param2, param2}\nblah";

Pattern p = Pattern.compile(
  "\\{([a-zA-Z]+)(?:\\s*:\\s*([^,\\s]+(?:\\s*,\\s*[^,\\s]+)*))\\}"
);
Matcher m = p.matcher(s);
if (m.find())
{
  String varName = m.group(1);
  String[] params = m.start(2) != -1
                  ? m.group(2).split("[,\\s]+")
                  : new String[0];

  System.out.printf("var: %s%n", varName);
  for (String param : params)
  {
    System.out.printf("param: %s%n", param);
  }
}

If you're holding out for a way to match the string and break out all the components with one regex, don't bother; this is as good as it gets (unless you switch to Perl 6). As for performance, I wouldn't worry about that until it becomes a problem.

1 Comment

Yep, thanks everyone for your help. I see now that I was barking up the wrong tree.
1

How about a regex, AND a Scanner ?

import java.util.Scanner;

public class Regex {

  public static void main(String[] args) {  
    String string = "{varName: param1, param2, param2}";   
    Scanner scanner = new Scanner(string);
    scanner.useDelimiter("[\\s{:,}]+");
    System.out.println("varName: " + scanner.next());
    while (scanner.hasNext()) {
      System.out.println("param: " + scanner.next());
    }
  }
}

1 Comment

Interesting that, haven't come across the Scanner class before. Reading up on it now.
1

A quick solution in psuedocode:

string.match(/{(\w+):([\w\s,]+)}/);
varName = matches[1];
params = matches[2].split(',');

1 Comment

yes, thought of that and this post was last hurrah before giving up and doing that. Worried a little about the performance of a separate split because it will be quite high volume - though, tbh, have no idea about that.
0

Post what you have so far. You can test it very easily on this website: http://www.regexplanet.com/simple/index.html

1 Comment

Added what I have got so far above
0

Ok I've got a solution in regex that seems to work just fine:

\{\s*([^\{\},\s]+)\s*(?:(?::\s*([^\{\},\s]+)\s*)(?:,\s*([^\{\},\s]+)\s*)*)?\}

Or to even keep the pretense of being able to understand it:

name = [^\{\},\s]+

ws = \s*

\{ws(name)ws(?:(?::ws(name)ws)(?:,ws(name)ws)*)?\}

I wouldn't recommend it but short testing seems to indicate that it works - nice brain teaser for 3am in the morning ;)

PS: If you're comparing the split solution to this or something similar I'd be interested in hearing if there were any performance differences - I don't think regex would be especially performant.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.