Parsing css with a regex

Question

I'm wanting to scan through a css file and capture both comments and the css. I've came up with a regex that's almost there, however it's not quite perfect as it misses out properties with multiple declarations i.e.

ul.menu li a, # Won't capture this line
ul.nice-menu li a { text-decoration: none; cursor:pointer; }

Here's the regex that I'm working with:

(\/\*[^.]+\*\/\n+)?([\t]*[a-zA-Z0-9\.# -_:@]+[\t\s]*\{[^}]+\})

I've been testing this at rubular.com and here is what it currently matches, and what the array output is like.

Result 1

[0] /* Index */
/*
GENERAL

PAGE REGIONS
- Header bar region
- Navigation bar region
- Footer region           
SECTION SPECIFIC
- Homepage
- News */

[1] html { background: #ddd; }

Result 2

[0]
[1] body { background: #FFF; font-family: "Arial", "Verdana", sans-serif; color: #545454;}

I must point out that I'm still a new when it comes to regular expressions, so if anyone can help and show where I'm going wrong, it'd be much appreciated :)

BTW: I'm using PHP and preg_match_all

can you define what kind of output you want? "you want css and comments " is to global to determine what you want. specify an array of some sort — Robert Cabri
– Robert Cabri, Commented Oct 24, 2009 at 14:25
I've added what the expected output is currently like to the question, hope this help :) — Damian
– Damian, Commented Oct 24, 2009 at 14:46

peter.murray.rust · Accepted Answer · 2009-10-24 14:41:58Z

6

CSS cannot be fully parsed with a regex (see CSS Grammar: http://www.w3.org/TR/CSS2/grammar.html). The {...} can be split over lines, for example, and your current version wouldn't handle this. If you need to do this, you should read the CSS spec and use a tool like ANTLR to generate a parser.

Here is an example from the W3C spec (http://www.w3.org/TR/CSS2/syndata.html):

@import "subs.css";
@import "print-main.css" print;
@media print {
  body { font-size: 10pt }
}
h1 {color: blue }

No normal regex is powerful enough to deal with nested {...} etc. let alone the contents of the imported stylesheets.

edited Oct 24, 2009 at 14:41

answered Oct 24, 2009 at 14:30

peter.murray.rust

38.2k46 gold badges161 silver badges226 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

mauris Over a year ago

remove all newlines and he'll be safe!

peter.murray.rust Over a year ago

@Mauris then there will be a single line.

ax. Over a year ago

@Mauris he won't. just think of "{" inside comments, strings, ... he should definitely go with a specialized css parser.

Damian Over a year ago

I'm going for a simple case - no nested curly braces {...} The regex that I'm currently working with matches declarations that span over multiple lines. If someone can manage to tweak the current one to handle with the case outlined above, I'd be very grateful!

peter.murray.rust Over a year ago

@Damian: yes, it's often possible to choose a specific subset of a language that you can write a parser for, but as soon as you get into the open world you will immediately find examples that break your tools. That's why it's important to adhere to standards and use existing tools rather that writing your own. You'll end up with a lot of work and it will still keep breaking

|

Jørgen Fogh · Accepted Answer · 2009-10-24 14:44:12Z

0

What language are you using?

You should probably just use a library to parse the CSS. Libraries can save you a lot of grief.

answered Oct 24, 2009 at 14:44

Jørgen Fogh

7,6872 gold badges38 silver badges48 bronze badges

1 Comment

Damian Over a year ago

I'm using PHP, and preg_match_all

Collectives™ on Stack Overflow

Parsing css with a regex

2 Answers 2

8 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related