1

I have a string which contains some functions (I know their names) and their parameters like this: translate(700 210) rotate(-30)

I would like to parse each one of them in a string array starting with the function name followed by the parameters.

I don't know much abour regex and so far I got this:

MatchCollection matches = Regex.Matches(attribute.InnerText, @"((translate|rotate|scale|matrix)\s*\(\s*(-?\d+\s*\,*\s*)+\))*");
for (int i = 0; i < matches.Count; i++)
{
    Console.WriteLine(matches[i].Value);
}

That this returns is:

translate(700 210)
[blank space]
rotate(-30)
[blank space]

This works for me because I can run another regular expression one each row from the resulting collection and get the contents. What I don't understand is why there are blank rows returned between the methods.

Also, is running a regex twice - once to separate the methods and once to actually parse them a good approach?

Thanks!

3
  • Try capturing the tokens you need as named groups. That way you don't have to worry about spaces between them. Commented May 23, 2017 at 14:33
  • When I change the last * to +, or when I eliminate the outer parens and * entirely, I don't get the empty matches. The outermost parens are unnecessary anyway -- in fact, worse than unnecessary. It matches the regex you give it as many times as possible. Your regex, "zero or more of this stuff", happens to match nothing. Commented May 23, 2017 at 14:43
  • Thank you both for replying! I'm not familiar with named groups in regex. I know that I can "save" patterns within the pattern for later use like this (..)(...)\1\2 but if this is what I need, I don't really know how to implement it in my pattern. Yes! Removing the outer brackets and the * did solve the blank space issue. Of course - Regex.Matches should look for all possible matches already! This solved the problem! Commented May 23, 2017 at 14:49

2 Answers 2

2

Regex.Matches will match your entire regular expression multiple times. It finds one match for the whole thing, then finds the next match for the whole thing.

The outermost parens with * indicate that you're willing to accept zero or more of the preceding group's contents as a match. So when it finds none of them, it happily returns that. That is not your intent. You want exactly one.

The blanks are harmless, but "zero or more" also includes two. Consider this string, with no space between the two functions:

var text = "translate(700 210)rotate(-30)";

That's one match, according to the regex you provided. You'll get "rotate" and "-30". If the missing space is an error, detect it and warn the user. If you're not going to do that, parse it correctly.

So let's get rid of the outermost parens and that *. We'll also name the capturing groups, for readability.

var matches = Regex.Matches(text, @"(?<funcName>translate|rotate|scale|matrix)\s*\(\s*(?<param>-?\s*\d+\s*\,*\s*)+\)");

foreach (Match match in matches)
{
    if (match.Groups["funcName"].Success)
    {
        var funcName = match.Groups["funcName"].Value;
        var param = Int32.Parse(match.Groups["param"].Value);

        Console.WriteLine($"{funcName}( {param} )");
    }
}

I also stuck in \s* after the optional -, just in case.

Sign up to request clarification or add additional context in comments.

2 Comments

This is excellent! Exactly what I was asking about! I see how named groups are used now - this was very useful to me and wasn't aware of that. And now using match.Groups["param"].Captures I can get all of the parameter values! I only got one more question: Why is there a "?" before <funcname> and one before <param>? I know that it matches 0 or 1 occurrences but what does it change in this case? I tried to remove those to see what happens and the regex didn't match anything.
The ? there is a different ?; ?<name> is how you indicate a group name.
0

I like using Regex with a dictionary

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace ConsoleApplication56
{
    class Program
    {
        static void Main(string[] args)
        {

            Dictionary<string, string> dict = new Dictionary<string, string>();

            string input = "translate(700 210) rotate(-30)";
            string pattern = @"(?'command'[^\(]+)\((?'value'[^\)]+)\)";

            MatchCollection matches = Regex.Matches(input, pattern);

            foreach(Match match in matches.Cast<Match>())
            {
                dict.Add(match.Groups["command"].Value, match.Groups["value"].Value);
            }

        }
    }

}

2 Comments

I think the order of operations matters here. And what if the user wants to translate a second time, with different parameters, after rotating?
I can change dict to Dictionary<string,List<string>>

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.