3

In an asp.net application, I need to validate user input against a configurable regular expression. I have a list of such regular expressions in a db, and it is user configurable, not fixed. My problem is that in these regexps, the dot is intented to match not any character, but any 'reasonable' character (reasonable in that context: letters, digits and some other ascii character). So the validation process is carried on in 2 steps

  1. Check against reg exp from list
  2. Check against 'reasonable' characters with something like ^[\w.+/-]*$

I'd like to use a single regexp, so that I can put it in a single regexp validator on the page - that gives a better user experience. I can do that searching the dots inside the regexp and replacing with my stricter character class [\w.+/-]. But not all the dots have the same meaning in a regexp.

So my question is : there is a tried and true way to find dots inside a regexp, but only when used as a character class? A regexp maybe?

4
  • So you are asking how to fix expressions that misuse . ? How will you be able to determine which dots actually mean "some other class"? Is there a specific pattern? Commented Jul 1, 2014 at 8:33
  • @PanagiotisKanavos yes. Any dots that have that meaning in a regexp, so for instance not [,.-] that have a literal meaning. Commented Jul 1, 2014 at 8:54
  • What I ask is - how can you know that a dot means one thing here but another a few characters later? The only hint that [,.-] may have to be treated differently is your comment, or the assumption that if a group would contain duplication with the "other" meaning it shouldn't be treated as that "other" meaning. Commented Jul 1, 2014 at 8:58
  • @PanagiotisKanavos no. By definition, in a regexp a dot is used as a literal when inside []. It's not something defined by me, it's how regexps work. But I'm not such an expert in regexp to be sure to find all special cases. Commented Jul 1, 2014 at 9:01

1 Answer 1

2

Just to get on the same page, in a.b[.]\.\[.\[\], two dots should be found: between a and b, and the last dot, between the escaped braces. The others are literal periods.

Lucky for us, ASP.NET does not know about \Q and \E to escape pattern fragments.

You can use this regex:

(?<!\\)\.(?!(?:\\\[\]\[]|[^][])*(?<!\\)\])

On the demo, observe that only the right dot gets matched.

Explanation

  • The lookbehind (?<!\\) ensures that we are not preceded by an escaping backslash
  • The \. matches the dot we want
  • The negative lookahead (?!(?:\\\[\]\[]|[^][])*(?<!\\)\]) ensures that we are not followed by...
  • (?:\\\[\]\[]|[^][])* any number of \[, \], or non[]` chars, then
  • a closing bracket not preceded by a backslash: (?<!\\)\]

Reference

Sign up to request clarification or add additional context in comments.

1 Comment

Wow! Thanks for the reference links too

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.