7

I'm a perl programmer doing a bit of C#. Facing an odd issue with Regex.Replace in regard to the zero-or-more assertion, *.

Say I wanted to replace zero or more letters with a single letter. In perl, I could do this:

my $s = "A";
$s =~ s/\w*/B/;
print $s;
$s now = "B"

But if I try and do the same in C#, like this:

string s = Regex.Replace("A", @"\w*", "B");
s now = "BB"

The docs do say "The * character is not recognized as a metacharacter within a replacement pattern"

Why? And is there any work around if you want a bit of your regex to slurp up some left over string which may not be there (like ".*?" on the end)

(this is a silly example, but you get the point)

15
  • @rich.okelly, is it really matter? Commented Feb 10, 2012 at 12:06
  • Interesting, fyi: ^\w*$ works fine. Commented Feb 10, 2012 at 12:10
  • @ingenu has the answer - should work in Perl and C#. Interesting... Looks like a bug, as if it is treating the EOL as a separate matchable. (The replacement pattern is "B" so that's a red herring.) Commented Feb 10, 2012 at 12:17
  • 1
    This has to be a bug: this Regex.Replace(".,A", @"\w*", "B") becomes B.B,BB Commented Feb 10, 2012 at 12:32
  • 1
    This is a really good question! To distill it down, the issue is: Why is Regex.Matches("A", @"\w*").Count equal to 2 rather than 1? And although a similar question has been asked and answered, for me the question of why is still open. After all, "A" is also 65 empty strings, followed by A, followed by 324 empty strings, so why 2 matches rather than 390 ?! Commented Feb 10, 2012 at 12:35

2 Answers 2

2

Start your pattern with ^ and end it with $ and your problem is solved.

string s = Regex.Replace("AAAA", @"^\w*$", "B");
Console.Write(s);

Alternatively - you can stop matching on 0 length strings with the + operator instead of the * operator:

string s = Regex.Replace("AAAA", @"\w+", "B");
Console.Write(s);
Sign up to request clarification or add additional context in comments.

5 Comments

Are you able to explain why this works out of interest? It does but I'm interested in why the observed behaviour in the original example and thus why you need to do this at all...
It seems ^\w* is enough but I would very much like to know why.
@Ingenu: having asked the question myself just after you I think I answered it. :)
I would also like to know why!
@didster Some of the answers here stackoverflow.com/q/9228096/460785 get quite close to explaining why.
2

Matt Fellows has the right answer on how to fix it. I believe I can try to explain why it breaks like that though...

Consider this:

Regex.Replace("AAA", @"Z*", "!!!|$&|")

It will return:

!!!||A!!!||A!!!||A!!!||

Z* in this case will match a series of zero length strings, each one sitting before or after one of the A characters. The $& puts in the matched string which in this case we can see is empty.

A similar thing happens I believe with

Regex.Replace("AAA", @"A*", "!!!|$&|")

Which returns

!!!|AAA|!!!||

The A* matching starts at the beginning and matches "AAA". It then matches "" and then stops.

I'm not sure if this is desired behaviour in this case but I suspect it is a necessary side effect of the way A* matches zero length strings.

Of course when you change the pattern to ^A*$ then the anchoring means that there is only one possible match and does more like what is expected in this case.

7 Comments

Just came to the same conclusion myself while playing. If you use + (1 or more) rather than * (0 or more) that also solves the problem.
Yes indeed though of course the question becomes why * was used in the first place instead of +. There may have been a good reason that in the case that the string was empty for example it should be wanted to match.
Also if you want to edit your answer to reflect this information you are welcome. I do feel a bit like I may be ninjaing your answer since I'd not have got here without your initial observation. :)
Heh - I have put it the + vs * part into my answer but your answer is much more complete so will leave you with it ;)
Going to have to start a new thread soon - but why would string s = Regex.Replace(".A.", @"\w*", "B"); give B.BB.B - it should wither be B.BBB.B or B.B.B shouldn't it?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.