2

Sample data: !!Part|123456,ABCDEF,ABC132!!

The comma delimited list can be any number of any combination of alphas and numbers

I want a regex to match the entries in the comma separated list:

What I have is: !!PART\|(\w+)(?:,{1}(\w+))*!!

Which seems to do the job, the thing is I want to retrieve them in order into an ArrayList or similar so in the sample data I would want:

  • 1 - 132456
  • 2 - ABCDEF
  • 3 - ABC123

The code I have is:

string partRegularExpression = @"!!PART\|(\w+)(?:,{1}(\w+))*!!"
Match match = Regex.Match(tag, partRegularExpression);
ArrayList results = new ArrayList();

foreach (Group group in match.Groups)
{
    results.Add(group.Value);
}

But that's giving me unexpected results. What am I missing?

Thanks

Edit: A solution would be to use a regex like !!PART\|(\w+(?:,??\w+)*)!! to capture the comma separated list and then split that as suggested by Marc Gravell

I am still curious for a working regex for this however :o)

4
  • Will your data always look similar to this? A need for regex is not immediately evident here. You could potentially parse off the exclamation points, split by the "|" and then split once more by comma to give you an array immediately. Commented Dec 15, 2008 at 13:40
  • Are you sure that you want to use a RegEx? "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems." Commented Dec 15, 2008 at 13:41
  • Nice quote. :o) I'm doing a number of (admittidly simpler) tags with Regex and there fine, so when I hit this one I continued down that route. Commented Dec 15, 2008 at 13:47
  • Data would be similar, the start and end would be the same the comma seperated list could have one to x entries. Commented Dec 15, 2008 at 13:48

4 Answers 4

3

You can either use split:

string csv = tag.Substring(7, tag.Length - 9);
string[] values = csv.Split(new char[] { ',' });

Or a regex:

Regex csvRegex = new Regex(@"!!Part\|(?:(?<value>\w+),?)+!!");
List<string> valuesRegex = new List<string>();
foreach (Capture capture in csvRegex.Match(tag).Groups["value"].Captures)
{
    valuesRegex.Add(capture.Value);
}
Sign up to request clarification or add additional context in comments.

4 Comments

Just threw my unit tests against this and it passed them all. Thanks :o)
It is interesting to see that this regex solution is slightly faster than mine. As usual though, the split version is fastest of all.
On 1 million iterations I’m getting 0.54sec for my RegEx 0.44sec for this one and 0.10sec for the split. Both RegEx’s are compiled.
For one thing your regex is using a non-greedy wildcard match, which requires quite a bit of backtracking.
1

Unless I'm mistaken, that still only counts as one group. I'm guessing you'll need to do a string.Split(',') to do what you want? Indeed, it looks a lot simpler to not bother with regex at all here... Depending on the data, how about:

        if (tag.StartsWith("!!Part|") && tag.EndsWith("!!"))
        {
            tag = tag.Substring(7, tag.Length - 9);
            string[] data = tag.Split(',');
        }

2 Comments

The (?: ) bracket isn't captured, only the group (\w+) inside. But no reason it couldn't be. I guess I'm over complicating things going 100% regex.
A regex of the form: !!PART\|(\w+(?:,??\w+)*)!! seems to capture only the comma seperated list. I could then split the single group (if needed).
1

I think the RegEx you are looking for is this:

(?:^!!PART\|){0,1}(?<value>.*?)(?:,|!!$)

This can then be run like this

        string tag = "!!Part|123456,ABCDEF,ABC132!!";

        string partRegularExpression = @"(?:^!!PART\|){0,1}(?<value>.*?)(?:,|!!$)";
        ArrayList results = new ArrayList();

        Regex extractNumber = new Regex(partRegularExpression, RegexOptions.IgnoreCase);
        MatchCollection matches = extractNumber.Matches(tag);
        foreach (Match match in matches)
        {
            results.Add(match.Groups["value"].Value);
        }            

        foreach (string s in results)
        {
            Console.WriteLine(s);
        }

2 Comments

Failed the case !!PART|123456!! In the array list I had two entries "!!" and """"
You need to run this slightly differently, see the example code I've added.
0

The following code

string testString = "!!Part|123456,ABCDEF,ABC132!!";
foreach(string component in testString.Split("|!,".ToCharArray(),StringSplitOptions.RemoveEmptyEntries) )
{
    Console.WriteLine(component);
}

will give the following output

Part
123456
ABCDEF
ABC132

This has the advantage of making the comma separated part of the string match up with the index numbers you (possibly accidentally incorrectly) specified in the original question (1,2,3).

HTH

-EDIT- forgot to mention, this may have drawbacks if the format of each string is not as expected above, but then again it would break just as easily without stupendously complex regex too.

2 Comments

yeah should have been 0,1,2. D'oh! :o)
i have a seperate validating regex for the tag, so by the time I get to extracting the data it would have been validated, but thanks for the heads up! :o)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.