Regular expression that validates a CSS selector

Question

What is a regular expression that can be used to validate a CSS selector, and can do so in a way that a invalid selector halts quickly.

Valid selectors:

EE
#myid
.class
.class.anotherclass
EE .class
EE .class EEE.anotherclass
EE[class="test"]
.class[alt~="test"]
#myid[alt="test"]
EE:hover
EE:first-child
E[lang|="en"]:first-child
EE#test .class>.anotherclass
EE#myid.classshit.anotherclass[class~="test"]:hover
EE#myid.classshit.anotherclass[class="test"]:first-child EE.Xx:hover

Invalid selectors, e.g. contain extra whitespace at the end of the line:

EE:hover   EE
EE .class EEE.anotherclass 
EE#myid.classshit.anotherclass[class="test"]:first-child EE.Xx:hov     9
EE#myid.classshit.anotherclass[class="test"]:first-child EE.Xx:hov  -daf

You might be able to write a RE for this, but are you sure that writing a grammar parser wouldn't be better? — zigdon
– zigdon, Commented Sep 30, 2010 at 21:57
I was just about to post an answer at your other question but you deleted it. — Daniel Vandersluis
– Daniel Vandersluis, Commented Sep 30, 2010 at 21:58
"Invalid selectors, e.g. contain extra whitespace at the end of the line:" What line would that be? I've never run into a CSS parser (e.g., for a CSS file, style attribute, etc) that had issues with trailing whitespace. — T.J. Crowder
– T.J. Crowder, Commented Sep 30, 2010 at 22:00

fuxia · Accepted Answer · 2013-03-10 04:31:01Z

4

Regular expressions are the wrong tool. CSS selectors are way to complex. Example:

bo\
dy:not(.\}) {}

Use a parser with a real tokenizer like this one: PHP-CSS-Parser. It is easier to rewrite it to Java than getting regex right.

edited Mar 10, 2013 at 4:31

answered Sep 30, 2010 at 22:05

fuxia

63.7k6 gold badges57 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

André Banderas · Accepted Answer · 2011-01-01 03:02:49Z

1

It's a Regex that I use in my codes:

[+>~, ]?\s*(\w*[#.]\w+|\w+|\*)+(:[\w\-]+\([\w\s\-\+]*\))*(\[[\w ]+=?[^\]]*\])*([#.]\w+)*(:[\w\-]+\([\w\s\-\+]*\))*

After tokenized I use the trim function to remove extra spaces e.g.:

expression:

EE.class      EE#id.class

tokens:

EE.class

   EE#id.class

tokens after trim:

EE.class

EE#id.class

OR e.g.

>EE.class (Alert when it's a direct child, then I treat with any substring code )

Other routines can check if token is a number e.g.

You can use http://regexpal.com/ for tests.

edited Jan 1, 2011 at 3:02

answered Dec 31, 2010 at 4:08

André Banderas

112 bronze badges

Comments

Tony Ennis · Accepted Answer · 2010-10-01 00:06:29Z

0

The problem with yer typical regular expression is that they are unable to handle arbitrary levels of nesting. They have no memory. Consider a string of some number of a's followed by the same number of b's: aaabbb and a reasonable regexp a*b*. When the regexp gets to the first 'b' it has no memory how many a's it recognized and therefore it can't recognize the same number of b's.

Now replace a and b with ( and ), IF and END, <x> and </x> etc... and you can see the problem.

answered Oct 1, 2010 at 0:06

Tony Ennis

12.4k8 gold badges59 silver badges79 bronze badges

1 Comment

Sarabjot Over a year ago

I remember Jeff Atwood talking about not writing an XML parser with RegEx's for just that reason. I was just asking about the selector which has a simple grammar. Example: tag#id.aClass.anotherClass:pseudo-class[matching="element"]. Which could have a second selector provided but as long as there is one or more you don't care.

Collectives™ on Stack Overflow

Regular expression that validates a CSS selector

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related