0

How can I match the names in an array such as this one:

[milk, pumpkin pie, eggs, juice]

It must also support single items in arrays. This works, but is crashing Reggy (a regex program) probably because the constant lookaheads are a strain.

([^,\[\]]+(?=,|\s*\]))

Also note that I don't want to capture the commas or square brackets - just the items...

EDIT:

I've gotten: (?<=\[)([^\[\]]+)(?=,|\]) which matches the items and commas.

4
  • why you not use ([^,\[\]\s]+) ? The space is special? Commented Feb 23, 2012 at 6:34
  • This will match any character that isn't in that char class, I don't want it to match surrounding text Commented Feb 23, 2012 at 7:02
  • Do you mean you have a string such as "[milk, pumpkin pie, eggs, juice]" and you want to retrieve the name without [ and ,. Commented Feb 23, 2012 at 7:07
  • You could think of it like that, I'm looking to match those. But there may be surrounding text around the array. Commented Feb 23, 2012 at 7:09

4 Answers 4

1

For all I can tell (and I tried using Python and its built-in regular expressions), there is nothing wrong with your regex. If it causes Reggy to crash, that is probably a bug and should reported as such.

However, it should be noted that your regex, while it avoids the commas and brackets to appear inside the matches, does include the spaces between a comma and the beginning of an item. For example, you will get " pumpkin pie" (note the leading space), rather than "pumpkin pie" as a match. I don't see any direct way to avoid this.

One way, but possibly not supported by Reggy, is to use groups to sub-select the relevant parts of matches. For example in Python:

import re
text    = '[milk, pumpkin pie, eggs, juice]'
pattern = re.compile(r'\s*([^,\[\]]+)(?=,|\s*\])')

for match in pattern.finditer(text):
    print match.group(1)

Note how the regex now includes leading whitespace (\s+) and round brackets around the relevant part of the match: ([^,\[\]]+). In the printing part I refer to this as group(1).

Sign up to request clarification or add additional context in comments.

2 Comments

Added note: Reggy seems to support groups. See the release notes reggyapp.com/release_notes/#1.3
Thanks, I wasn't too fussed about whitespace, but your variant works too.
1

Here's what I would suggest in .NET:

(?<=\[(?:[^\]]+,\s+)?) // Look behind for the start bracket and possibly previous values
([^\],]+)              // capture the value until the next comma or end bracket 
(?=,|])                // Look ahead and find a comma or end bracket

(Broken into multiple lines for clarity only.)

The issue with using JavaScript's variant of regex is the lack of a zero-width positive look-behind assertion, which is needed if you want to match more than one element of the array.

Comments

1

javascript lookahead works fine, and this doesn't capture the comma/space sequences:

    //only assumption is csv
    var data = '[milk, pumpkin pie, eggs, juice]';
            var myregexp = /[^,]+(?=,\s?|]$)/g;
            var match = myregexp.exec(data);
            var result = "matches found:\n";
            while (match != null) {
                //first match will include the [, strip it.
                result += match[0] + ',\n';
                match = myregexp.exec(data);
            }
            alert(result);

1 Comment

Unfortunately this fails if there is a word after the [...], thanks though
0

Try \b[\w ]+\b.

This will match multiple words allowing spaces in between ([\w ]+). Since the + is greedy it will match as many words as possible, but it won't go over a comma or bracket boundary because that doesn't match \w or .

You can play around with it here.

1 Comment

Thanks, I should have mentioned that text may be surround the array that should not be matched.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.