0

What is a regular expression that can be used to validate a CSS selector, and can do so in a way that a invalid selector halts quickly.

Valid selectors:

EE
#myid
.class
.class.anotherclass
EE .class
EE .class EEE.anotherclass
EE[class="test"]
.class[alt~="test"]
#myid[alt="test"]
EE:hover
EE:first-child
E[lang|="en"]:first-child
EE#test .class>.anotherclass
EE#myid.classshit.anotherclass[class~="test"]:hover
EE#myid.classshit.anotherclass[class="test"]:first-child EE.Xx:hover

Invalid selectors, e.g. contain extra whitespace at the end of the line:

EE:hover   EE
EE .class EEE.anotherclass 
EE#myid.classshit.anotherclass[class="test"]:first-child EE.Xx:hov     9
EE#myid.classshit.anotherclass[class="test"]:first-child EE.Xx:hov  -daf
3
  • 2
    You might be able to write a RE for this, but are you sure that writing a grammar parser wouldn't be better? Commented Sep 30, 2010 at 21:57
  • I was just about to post an answer at your other question but you deleted it. Commented Sep 30, 2010 at 21:58
  • "Invalid selectors, e.g. contain extra whitespace at the end of the line:" What line would that be? I've never run into a CSS parser (e.g., for a CSS file, style attribute, etc) that had issues with trailing whitespace. Commented Sep 30, 2010 at 22:00

3 Answers 3

4

Regular expressions are the wrong tool. CSS selectors are way to complex. Example:

bo\
dy:not(.\}) {}

Use a parser with a real tokenizer like this one: PHP-CSS-Parser. It is easier to rewrite it to Java than getting regex right.

Sign up to request clarification or add additional context in comments.

Comments

1

It's a Regex that I use in my codes:

[+>~, ]?\s*(\w*[#.]\w+|\w+|\*)+(:[\w\-]+\([\w\s\-\+]*\))*(\[[\w ]+=?[^\]]*\])*([#.]\w+)*(:[\w\-]+\([\w\s\-\+]*\))*

After tokenized I use the trim function to remove extra spaces e.g.:

expression:

EE.class      EE#id.class

tokens:

EE.class

   EE#id.class

tokens after trim:

EE.class

EE#id.class

OR e.g.

>EE.class (Alert when it's a direct child, then I treat with any substring code )

Other routines can check if token is a number e.g.

You can use http://regexpal.com/ for tests.

Comments

0

The problem with yer typical regular expression is that they are unable to handle arbitrary levels of nesting. They have no memory. Consider a string of some number of a's followed by the same number of b's: aaabbb and a reasonable regexp a*b*. When the regexp gets to the first 'b' it has no memory how many a's it recognized and therefore it can't recognize the same number of b's.

Now replace a and b with ( and ), IF and END, <x> and </x> etc... and you can see the problem.

1 Comment

I remember Jeff Atwood talking about not writing an XML parser with RegEx's for just that reason. I was just asking about the selector which has a simple grammar. Example: tag#id.aClass.anotherClass:pseudo-class[matching="element"]. Which could have a second selector provided but as long as there is one or more you don't care.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.