4

Is there any way to define custom character class in C# regex?

In flex it is done in very obvious way:

DIGIT    [0-9]
%%
{DIGIT}+    {printf( "An integer: %s (%d)\n", yytext, atoi( yytext ) );}

http://westes.github.io/flex/manual/Simple-Examples.html#Simple-Examples

As explained in this answer, in PHP defining a custom character class works like this:

(?(DEFINE)(?<a>[acegikmoqstz@#&]))\g<a>(?:.*\g<a>){2}

Is there a way to achieve this result in c#, without repeating the full character class definition each time it is used?

6
  • @Rawling: It's same kind of question, but the point is: How to do it (if possible) in C#. Commented Aug 17, 2014 at 11:20
  • 1
    Reason for my reopen vote: The answer in the linked duplicate does not address c# at all, it explicitly only deals with Java and PHP. The solutions presented there are not applicable for c# (@Rawling) Commented Aug 20, 2014 at 14:45
  • @HugoRune Good point, I thought both the answer were just language-specific versions of string concatenation but the PHP one is doing something special. There is a C# specific question here and I expect most answers you attract will be along the same lines. Commented Aug 20, 2014 at 15:00
  • 2
    @Rawling Yes, I don't think a better solution exists either. But I was googling for this problem, and this question here seemed to be the only applicable result, so a definitive answer here should be useful to future visitors, even if it is a negative one. Commented Aug 20, 2014 at 15:04
  • It may be possible to use named blocks and class subtraction to get the same effect, or there may be a named block that already matches the required characters Commented Aug 20, 2014 at 15:13

2 Answers 2

3

Custom character classes aren't supported in C# but you may be able to use named blocks and character class subtraction to get a similar effect.

.NET defines a large number of named blocks that correspond to Unicode character categories like math or Greek symbols. There may be a block that already matches your requirements.

Character class subtraction allows you to exclude the characters in one class or block from the characters in a broader class. The syntax is :

[ base_group -[ excluded_group ]]

The following example, copied from the linked documentation, matches all Unicode characters except whitespace, Greek characters, punctuation and newlines:

[\u0000-\uFFFF-[\s\p{P}\p{IsGreek}\x85]]
Sign up to request clarification or add additional context in comments.

Comments

2

Nope, not supported in C#. This link will give you a nice overview of the .NET Regex engine. Note that nothing really stops you from defining variables and using them to construct your Regex string:

var digit = "[0-9]";
var regex = new Regex(digit + "[A-Z]");

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.