4

I need to split a string from another system, which represents a serialized object. the object itself could have another object of the same type nested as a property. I need a way to essentially serialize the string into a string array. for example.

"{1,Dave,2}" should create a string array with 3 elements "1", "Dave", "2".

"{1,{Cat,Yellow},2}" should become an array with 3 elements "1", "{Cat,Yellow}", "2".

"{1,{Cat,{Blue,1}},2}" should become an array with 3 elements "1", "{Cat,{Blue,1}}", "2".

Basically the nesting could be N level deep, so potentially, I could have something like "{{Cat,{Blue,1}},{Dog,White}}" and my resulting array should have 2 elements: "{Cat,{Blue,1}}" and "{Dog,White}"

I thought of writing a custom parser to parse the string manually. But this seems like the kind of problems RegEx was designed to solve, however, I'm not very good with regex, hence would appreciate some pointers from the RegEx pros out there.

Thanks

8
  • That is a perfect task for regular expression balancing groups Commented Jan 16, 2014 at 8:13
  • Must you use regex for this? Commented Jan 16, 2014 at 8:16
  • 1
    Do you need to parse nested elements too? Commented Jan 16, 2014 at 8:17
  • Yes, will need to parse nested elements too. but if I could get the first level working, I can just recursively apply the same logic. Commented Jan 16, 2014 at 8:19
  • 4
    In general, regexes are not seen as the appropriate tool for parsing nested structures. Yes, it's possible using some of the extended regex features of .NET, but, in general, parsing algorithms are preferred. A simple one that could be adapted to your problem can be found here: stackoverflow.com/a/5477921/87698 Commented Jan 16, 2014 at 8:31

2 Answers 2

4

Well, you can use this split which makes use of balancing groups:

,(?=[^{}]*(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))$)

It will match a comma that has no {} ahead, or groups within {}.

In code:

string msg= "{1,{Cat,{Blue,1}},2}";
msg = msg.Substring(1, msg.Length - 2);
string[] charSetOccurences = Regex.Split(msg, @",(?=[^{}]*(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))$)");
foreach (string s in charSetOccurences)
{
    Console.WriteLine(s);
}

Output:

1
{Cat,{Blue,1}}
2

ideone demo


Brief explanation:

(?=[^{}]*(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))$)

Is a huge lookahead...

[^{}]* will match any characters except {} any number of times.

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!)) will match {} groups with any level of nesting.

It will first catch an opening { and name it O (I chose it to mean 'opening') here:

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
           ^

Then any characters except braces:

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
             ^^^^^^

And repeat that group to accommodate nesting:

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
                    ^

This part balances the opening brace:

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
                        ^^^^^^^^

With other non {} and repeat to cater for the nestings:

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
                                ^^^^^^^ ^

All this, at least 0 times:

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
                                          ^

The last conditional negative lookahead is just a closure and ensure there's no unbalanced braces.

Sign up to request clarification or add additional context in comments.

3 Comments

Very close, but doesn't parse a string like "{1,{Cat,{Blue,1}},2,{Dog,5}}". Still helpful. thanks
@Kiwik Oh, that's because of Trim (trimming all the } at the end) and not really of the regex. I changed it to be Substring instead.
Yeap, that nails it. Thank you. Marked as accepted. And thank you for such a detailed explanation.
3

It's not a Split, but the if you use the following expression with Match you'll either get a failed match or one with your individual values in m.Groups[1].Captures:

^\{(?:((?:[^{}]|\{(?<Depth>)|\}(?<-Depth>))*?)(?:,(?(Depth)(?!))|\}$))*$

1 Comment

m.Groups[1].Captures provided what I needed. Thank you very much. However @Jerrys answer was the more completed solution. Hence marking his as accepted.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.