How to match array items with regex

Question

How can I match the names in an array such as this one:

[milk, pumpkin pie, eggs, juice]

It must also support single items in arrays. This works, but is crashing Reggy (a regex program) probably because the constant lookaheads are a strain.

([^,\[\]]+(?=,|\s*\]))

Also note that I don't want to capture the commas or square brackets - just the items...

EDIT:

I've gotten: (?<=\[)([^\[\]]+)(?=,|\]) which matches the items and commas.

This will match any character that isn't in that char class, I don't want it to match surrounding text — Aram Kocharyan
– Aram Kocharyan, Commented Feb 23, 2012 at 7:02
Do you mean you have a string such as "[milk, pumpkin pie, eggs, juice]" and you want to retrieve the name without [ and ,. — steveyang
– steveyang, Commented Feb 23, 2012 at 7:07
You could think of it like that, I'm looking to match those. But there may be surrounding text around the array. — Aram Kocharyan
– Aram Kocharyan, Commented Feb 23, 2012 at 7:09

jogojapan · Accepted Answer · 2012-02-23 07:09:18Z

1

For all I can tell (and I tried using Python and its built-in regular expressions), there is nothing wrong with your regex. If it causes Reggy to crash, that is probably a bug and should reported as such.

However, it should be noted that your regex, while it avoids the commas and brackets to appear inside the matches, does include the spaces between a comma and the beginning of an item. For example, you will get " pumpkin pie" (note the leading space), rather than "pumpkin pie" as a match. I don't see any direct way to avoid this.

One way, but possibly not supported by Reggy, is to use groups to sub-select the relevant parts of matches. For example in Python:

import re
text    = '[milk, pumpkin pie, eggs, juice]'
pattern = re.compile(r'\s*([^,\[\]]+)(?=,|\s*\])')

for match in pattern.finditer(text):
    print match.group(1)

Note how the regex now includes leading whitespace (\s+) and round brackets around the relevant part of the match: ([^,\[\]]+). In the printing part I refer to this as group(1).

answered Feb 23, 2012 at 7:09

jogojapan

70.4k11 gold badges110 silver badges136 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

jogojapan Over a year ago

Added note: Reggy seems to support groups. See the release notes reggyapp.com/release_notes/#1.3

Aram Kocharyan Over a year ago

Thanks, I wasn't too fussed about whitespace, but your variant works too.

richardtallent · Accepted Answer · 2012-02-23 06:42:13Z

1

Here's what I would suggest in .NET:

(?<=\[(?:[^\]]+,\s+)?) // Look behind for the start bracket and possibly previous values
([^\],]+)              // capture the value until the next comma or end bracket 
(?=,|])                // Look ahead and find a comma or end bracket

(Broken into multiple lines for clarity only.)

The issue with using JavaScript's variant of regex is the lack of a zero-width positive look-behind assertion, which is needed if you want to match more than one element of the array.

answered Feb 23, 2012 at 6:42

richardtallent

35.5k14 gold badges87 silver badges126 bronze badges

Comments

Scott Weaver · Accepted Answer · 2012-02-23 07:45:41Z

1

javascript lookahead works fine, and this doesn't capture the comma/space sequences:

    //only assumption is csv
    var data = '[milk, pumpkin pie, eggs, juice]';
            var myregexp = /[^,]+(?=,\s?|]$)/g;
            var match = myregexp.exec(data);
            var result = "matches found:\n";
            while (match != null) {
                //first match will include the [, strip it.
                result += match[0] + ',\n';
                match = myregexp.exec(data);
            }
            alert(result);

edited Feb 23, 2012 at 7:45

answered Feb 23, 2012 at 7:39

Scott Weaver

7,3832 gold badges33 silver badges45 bronze badges

1 Comment

Aram Kocharyan Over a year ago

Unfortunately this fails if there is a word after the [...], thanks though

mathematical.coffee · Accepted Answer · 2012-02-23 06:42:42Z

0

Try \b[\w ]+\b.

This will match multiple words allowing spaces in between ([\w ]+). Since the + is greedy it will match as many words as possible, but it won't go over a comma or bracket boundary because that doesn't match \w or .

You can play around with it here.

answered Feb 23, 2012 at 6:42

mathematical.coffee

57.1k15 gold badges160 silver badges197 bronze badges

1 Comment

Aram Kocharyan Over a year ago

Thanks, I should have mentioned that text may be surround the array that should not be matched.

Collectives™ on Stack Overflow

How to match array items with regex

4 Answers 4

2 Comments

Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related