0

I'm using .NET's regex as part of my university assignment (writing a compiler). I found an interesting caveat that's driving me nuts.

I have this regex pattern: \A(?:(func)[^\w\d]*|(func)\z)

When I try to match string "func sum(a, b)\n..., the resulting Match object has one item in CaptureCollection containing the string "func ".

Why am I getting the whitespace along with my keyword?

2
  • There must be a space or a new line character after func, else it won't match inputs like func sum(a, b) Commented Oct 8, 2015 at 19:32
  • There are lots of ways to do it, it's not really clear what your goal is imho. Your pattern starts out saying "match but don't capture"... but without knowing what you do with that it's anyone's guess. Commented Oct 8, 2015 at 19:39

3 Answers 3

4

You're talking about item #0. The item at index 0 is always the whole match. The following items are the captured groups.

You got a match from the (func)[^\w\d]* part, and [^\w\d]* captured the whitespace you're seeing in the result.

Sign up to request clarification or add additional context in comments.

1 Comment

Confusing Groups and Captures in .NET in pretty common, as these terms are often used interchangeably in regex parlance. But in .NET, you're almost always looking for groups. Each group contains one or more captures. You can get several captures when you use a capturing group inside a quantifier. Each capture holds the value you get at a given iteration. The value of the group is the value of the final capture. And finally, the whole match is a group, adding to the confusion.
1

Because [^\w\d]* part match a space character, without it it gives only func. Compere it to THIS

Comments

0

You're trying to negate a character group of either a word or digit to come immediately after "func" with [^\w\d]*, a whitespace qualifies.

You also specify any number of non-words and non-digits with the *, explaining the several whitespaces captured alongside "func".

I hope that answers your question as to why you're capturing the whitespaces.

I am uncertain what your exact goal is, so here are some examples:

This statement matches only "func" with any word immediately after it: \A(?:(func)[\w\d]*|(func)\z)

This statement matches "func" at the beginning of EACH line and the end of the ENTIRE string: ^func|func\z

This statement matches "func" at the beginning of the entire string and the end of the ENTIRE string: \Afunc|func\z

You can find a quick reference page here: Regular Expression Language - Quick Reference

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.