Get string pattern from this string using Regex

Question

I have a string as shown below in my C# app.

Multiply(Sum(3,5,4), Division(4,5,5), Subtract(7,8,9))

Sum(), Division(), Subtract() are different different methods inside the Multiple().

Is there any way to get each one seperately like Sum(3,5,4), Division(4,5,5), Substract(7,8,9) and Multiply() using C# Regex methods?

Sum, Division, Substract and Multiply are constant keywords words.

Are you planning to nest them further? Say, Multiply(Multiply(Multiply(1,2),Multiply(3,4)),Multiply(5,6))? — Sergey Kalinichenko
– Sergey Kalinichenko, Commented Jan 9, 2012 at 14:00
My solution below handles nesting. Because nesting is needed you can't do it with one simple invocation of a Regex method though, you'll have to use a for-loop. It shouldn't be a big deal. — Paul Eastlund
– Paul Eastlund, Commented Jan 9, 2012 at 15:40

Jon Egerton · Accepted Answer · 2012-01-09 14:15:14Z

1

If the nesting is arbitrarily deep you should do this iteratively with something like Regexp.Matches() and Regexp.Replace().

Make a copy of your whole string. Use ([a-zA-Z]+$[0-9, ]*$)(, )? as the regular expression. That will match all of the lowest-level function calls -- all of the leaf nodes of your call graph.

Call Regexp.Matches to extract all of the matches, call Regexp.Replace to get rid of them all from the string copy. That will get rid of all the leaf nodes of the call graph. Call Matches() and Replace() again to get rid of the next level of calls up, and keep repeating until the string copy is empty.

edited Jan 9, 2012 at 14:15

Jon Egerton

41.7k11 gold badges100 silver badges130 bronze badges

answered Jan 9, 2012 at 14:10

Paul Eastlund

7,0334 gold badges21 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Derreck Dean Over a year ago

Start with \w+$(\d,?)+$, and make passes at the string, replacing each match found with the answer you calculate based on what's passed in. You would keep repeating this until the last thing left in the string is the answer.

Sergey Kalinichenko · Accepted Answer · 2012-01-09 14:33:48Z

1

You cannot do arbitrary nesting with RegExp - it is impossible even theoretically because of the limitations of RegExp model.

What you need in this case is a parser. It does not require much work to build a very simple recursive descent parser manually, but once the complexity becomes considerable, you should switch to a parser generator. My personal favorite is ANTLR, but you have lots of other choices.

answered Jan 9, 2012 at 14:33

Sergey Kalinichenko

729k85 gold badges1.2k silver badges1.6k bronze badges

Comments

shift66 · Accepted Answer · 2012-01-09 14:12:13Z

0

Yes if you don't use another method call when passing parameter to your methods.
(like Sum(2, Sum(3,2), 4))
In that case you can use this pattern:
^\w+$(.*)$$ then get group 1 (it's the (.*) group) which are parameters (Sum(3,5,4), Division(4,5,5), Subtract(7,8,9)) and then use this pattern for getted group to find all parameters:
\w+$.*$

If your Multiply method may have another nested methods, regexp cant help you.In that case you should count braces to see which wher was closed

edited Jan 9, 2012 at 14:12

answered Jan 9, 2012 at 14:06

shift66

12k13 gold badges54 silver badges83 bronze badges

Comments

user557597 · Accepted Answer · 2012-01-10 00:20:56Z

C# should be able to do balanced text via recursion in regular expressions. The only problem is I think it retains the outer match as a whole. To further parse the inner contents (between the parenthesis) needs a recursive function call, picking off the tokens each time.

I agree with @dasblinkenlight though about needing a decent parser. As he says, the complexity can become quickly considerable.

The regex below is from Perl, but the construct's should be the same for .Net hacking.
As you can see, the regex is like a seive in that the general form is adhered to, but
only comma and digits are handled between Math tokens, allowing the rest to fall through.

But, if this is the only thing you care about, then it should work. You'll notice that even though you can parse it into a data structure (as below), to use the structure in an internal way requires yet another recursive "parse" on the data structure (albeit easier). If for display or statistical purposes then its not a problem.

The expanded regex:

 {
    (                                      #1 - Recursion group 1                            
      \b(\w+)\s*                                #2 - Math token
      \(                                        #  - Open parenth                   
         (                                        #3 - Capture between parenth's
           (?:  (?> (?: (?!\b\w+\s*\(|\)) . )+ )     # - Get all up to next math token or close parenth
              | (?1)                                 # - OR, recurse group 1
           )*                                        # - Optionally do many times 
         )                                        # - End capture 3
      \)                                        # - Close parenth
    )                                      # - End recursion group 1
    \s*(\,?)                               #4 - Capture optional comma ','

  |                                    # OR,
                                       # (Here, it is only getting comma and digits, ignoring the rest.
                                       #  Comma's  ',' are escaped to make them standout)
    \s*                                       
    (?|                                    # - Start branch reset
        (\d+)\s*(\,?)                          #5,6 - Digits then optional comma ','
      | (?<=\,)()\s*(\,|\s*$)                  #5,6 - Comma behind. No digit then, comma or end
    )                                      # - End branch reset
 }xs;   # Options: expanded, single-line

Here is a rapid prototype in Perl (easier than C#):

 use Data::Dumper;


#//
 my $regex = qr{(\b(\w+)\s*\(((?:(?>(?:(?!\b\w+\s*\(|\)).)+)|(?1))*)\))\s*(\,?)|\s*(?|(\d+)\s*(\,?)|(?<=\,)()\s*(\,|\s*$))}s;


#//
 my $sample = ', asdf Multiply(9, 4, 3, hello,  _Sum(3,5,4,) , Division(4, Sum(3,5,4), 5), ,, Subtract(7,8,9))';

 print_math_toks( 0, $sample );

 my @array;
 store_math_toks( 0, $sample, \@array );
 print Dumper(\@array);


#//
 sub print_math_toks
 {
    my ($cnt, $segment) = @_;
    while ($segment  =~ /$regex/g )
    {
      if (defined $5) {
         next if $cnt < 1;
         print "\t"x($cnt+1), "$5$6\n";
      }
      else {
         ++$cnt;
         print "\t"x$cnt, "$2(\n";
         my $post = $4;

         $cnt = print_math_toks( $cnt, $3 );

         print "\t"x$cnt, ")$post\n";
         --$cnt;
      }
    }
    return $cnt;
 }


 sub store_math_toks
 {
    my ($cnt, $segment, $ary) = @_;
    while ($segment  =~ /$regex/g )
    {
      if (defined $5) {
         next if $cnt < 1;
         if (length $5) {
            push (@$ary, $5);
         }
         else {
            push (@$ary, '');
         }
      }
      else {
         ++$cnt;
         my %hash;
         $hash{$2} = [];
         push (@$ary, \%hash);

         $cnt = store_math_toks( $cnt, $3, $hash{$2} );

         --$cnt;
      }
    }
    return $cnt;
 }

Output:

        Multiply(
                9,
                4,
                3,
                _Sum(
                        3,
                        5,
                        4,

                ),
                Division(
                        4,
                        Sum(
                                3,
                                5,
                                4
                        ),
                        5
                ),
                ,
                ,
                Subtract(
                        7,
                        8,
                        9
                )
        )
$VAR1 = [
          {
            'Multiply' => [
                            '9',
                            '4',
                            '3',
                            {
                              '_Sum' => [
                                          '3',
                                          '5',
                                          '4',
                                          ''
                                        ]
                            },
                            {
                              'Division' => [
                                              '4',
                                              {
                                                'Sum' => [
                                                           '3',
                                                           '5',
                                                           '4'
                                                         ]
                                              },
                                              '5'
                                            ]
                            },
                            '',
                            '',
                            {
                              'Subtract' => [
                                              '7',
                                              '8',
                                              '9'
                                            ]
                            }
                          ]
          }
        ];

Collectives™ on Stack Overflow

Get string pattern from this string using Regex

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related