3

I need to build a JavaScript regular expression with the following constraints:

  • The input string needs to be at least 6 characters long
  • The input string needs to contain at least 1 alphabetical character
  • The input string needs to contain at least 1 non-alphabetical character

I'm seriously lacking a lookback feature in JavaScript. The thing I came up with:

((([a-zA-Z][^a-zA-Z])|([^a-zA-Z][a-zA-Z]))....)|
(.(([a-zA-Z][^a-zA-Z])|([^a-zA-Z][a-zA-Z]))...)|
(..(([a-zA-Z][^a-zA-Z])|([^a-zA-Z][a-zA-Z]))..)|
(...(([a-zA-Z][^a-zA-Z])|([^a-zA-Z][a-zA-Z])).)|
(....(([a-zA-Z][^a-zA-Z])|([^a-zA-Z][a-zA-Z])))

This looks pretty long. Is there a better way?

How I came to this:

  1. Regex for alphabetical character is [a-zA-Z]
  2. Regex for non-alphabetical character is [^a-zA-Z]
  3. So I need to look for a [a-zA-Z][^a-zA-Z] or [^a-zA-Z][a-zA-Z]
    so (([a-zA-Z][^a-zA-Z])|([^a-zA-Z][a-zA-Z])).
  4. I need to check for n preceding characters and 6-n succeeding characters.
2
  • Why didn't you accept any of the answer. Is there still a problem ? Commented Nov 29, 2012 at 11:18
  • I always give people an opportunity to join the discussion. Commented Nov 29, 2012 at 11:32

3 Answers 3

5
/^(?=.{6})(?=.*[a-zA-Z])(?=.*[^a-zA-Z])/

This means:

^ - start of the string
(?= ... ) - followed by (i.e. an independent submatch; it won't move the current match position)
.{6} - six characters ("start of string followed by six characters" implements the "must be at least six characters long" rule)
.* - 0 or more of any character (except newline - may need to fix this?)
[a-zA-Z] - a letter (.*[a-zA-Z] therefore finds any string with a letter anywhere in it (technically it finds the last letter in it))
[^a-zA-Z] - a non-letter character

In summary: Starting from the beginning of the string, we try to match each of the following in turn:

  • 6 characters (if we find those, the string must be 6 characters long (or more))
  • an arbitrary string followed by a letter
  • an arbitrary string followed by a non-letter
Sign up to request clarification or add additional context in comments.

9 Comments

Pardon me, I'm a regex noob : Is the ^ at the start needed ?
Can you explain your regex a bit?
@dystroy Technically, no. But it's an optimization: If the regex engine isn't very clever, it will have to repeat the check at every position of the input string otherwise.
This has nothing to match, just look-aheads
@Some1.Kill.The.DJ As there is no $ at the end, I see no problem with {6}. The group doesn't have to match the whole string.
|
3

Use this regex...

/^(?=.{6,})(?=.*[a-zA-Z])(?=.*[^a-zA-Z]).*$/
  -------- ------------- --------------
    ^          ^              ^
    |          |              |->checks for a single non-alphabet
    |          |->checks for a single alphabet
    |->checks for 6 to many characters

(?=) is a zero width look ahead which checks for a match.It doesn't consume characters.This is the reason why we can use multiple lookaheads back to back

2 Comments

Can you explain your regex a bit?
@KeesC.Bakker (?=) does look-ahead
0

Similar answer to others, thus this doesn't need much explanation, I think the best way is to do

/^(?=.*[a-zA-Z])(?=.*[^a-zA-Z]).{6,}$/

This starts at the beginning of the string, looks ahead for an alphabetical character, looks ahead for a non-alphabetical character and, in the end, it finds a string of 6+ chars, I think there's no need for lookaheads about length

4 Comments

Sure, you can always reorder the checks and remove the (?= ) around the last one.
That's not just about reordering, it's about removing a redundant check and obtaining a string as a match. Using only lookaheads doesn't output a match string even if the input string matches.
OK, you can also always add .*$ to the regex, which is effectively what you've done: 1) /^(?=.{6})(?=.*[a-zA-Z])(?=.*[^a-zA-Z]).*$/ 2) Reorder: /^(?=.*[a-zA-Z])(?=.*[^a-zA-Z])(?=.{6}).*$/ 3) Remove: /^(?=.*[a-zA-Z])(?=.*[^a-zA-Z]).{6}.*$/ 4) Fuse: /^(?=.*[a-zA-Z])(?=.*[^a-zA-Z]).{6,}$/ If you'd skipped step 2, you'd've ended up with /^(?=.{6})(?=.*[a-zA-Z]).*[^a-zA-Z].*$/, which works the same.
Sorry if I was not clear. My intended purpose with this answer was to match the whole string. I did it. Also @Some1.Kill.The.DJ did it, but I think the lookahead to check the string's length is useless and can be effortlessly joined with the string's match itself resulting in two lookaheads instead of three which I think (and might be disproved) results in slightly better performance.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.