Regular expression to match ID's and classes in CSS page

Question

I'm trying to analyze HTML code and extract all CSS classes and ID's from the source. So I need to extract whatever is between two quotation marks, which can be preceded by either class or id:

id="<extract this>"

class="<extract this>"

This is the compulsory comment reminding you that you should be using an XML/HTML parser not regex for HTML. — Etheryte
– Etheryte, Commented Apr 24, 2014 at 18:07
Whatever programming language you are using, be sure to use a parser and not regex. — hwnd
– hwnd, Commented Apr 24, 2014 at 18:07
Thank you for your suggestions, but if I wanted to use an HTML Parser, I would have posted that instead. I simply need to extract any classes and ID's from a page, that's all. I'm organizing stylesheets so I want a list of classes and ID's used in the plain HTML source before it gets compiled and jQuery Mobile blows it up with its own custom classes. — eveo
– eveo, Commented Apr 24, 2014 at 18:10

Tim Pietzcker · Accepted Answer · 2014-04-24 18:12:10Z

2

/(?:id|class)="([^"]*)"/gi

replacement expression: $1

this regex in english: match either "id" or "class" then an equals sign and quote, then capture everything that is not a quote before matching another quote. do this globally and case insensitively.

edited Apr 24, 2014 at 18:12

Tim Pietzcker

337k59 gold badges520 silver badges572 bronze badges

answered Apr 24, 2014 at 18:11

Pat Newell

2,2942 gold badges18 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Pat Newell Over a year ago

nice @Tim! those regexes... they'll get you every time.

eveo Over a year ago

I inputted this on regexr.com, along with an HTML page at the bottom and it matches the entire "id='id'" instead of just id. Can you verify? cl.ly/image/18363j1w1g1V

hwnd · Accepted Answer · 2014-04-24 18:21:06Z

2

Since you prefer using regular expression, here is one way I suppose.

\b(?:id|class)\s*=\s*"([^"]*)"

Regular expression:

\b             # the boundary between a word char (\w) and not a word char
(?:            # group, but do not capture:
  id           # 'id'
 |             # OR
  class        # 'class'
)              # end of grouping
\s*            # whitespace (\n, \r, \t, \f, and " ") (0 or more times)
 =             # '='
 \s*           # whitespace (\n, \r, \t, \f, and " ") (0 or more times)
   "           # '"'
   (           # group and capture to \1:
    [^"]*      # any character except: '"' (0 or more times)
   )           # end of \1
   "           # '"'

edited Apr 24, 2014 at 18:21

answered Apr 24, 2014 at 18:13

hwnd

70.9k4 gold badges100 silver badges135 bronze badges

Comments

Pedro Lobito · Accepted Answer · 2014-04-24 18:33:54Z

1

You may want to try this:

<?php

$css = <<< EOF
id="<extract this>"
class="<extract this>"id="<extract this2>"
class="<extract this3>"id="<extract this4>"
class="<extract this5>"id="<extract this6>"
class="<extract this7>"id="<extract this8>"
class="<extract this9>"
EOF;

preg_match_all('/(?:id|class)="(.*?)"/sim', $css , $classes, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($classes[1]); $i++) {
    echo $classes[1][$i]."\n";
}
    /*
    <extract this>
    <extract this>
    <extract this2>
    <extract this3>
    <extract this4>
    <extract this5>
    <extract this6>
    <extract this7>
    <extract this8>
    <extract this9>
    */
?>

DEMO:
http://ideone.com/Nr9FPt

answered Apr 24, 2014 at 18:33

Pedro Lobito

99.8k36 gold badges274 silver badges278 bronze badges

3 Comments

eveo Over a year ago

Exactly what I wanted. I just threw my giant HTML page into the CSS variable, ran it, and it neatly printed every ID and class on that HTML page. Thank you!

eveo Over a year ago

Tuga, what does the /sim mean?

Pedro Lobito Over a year ago

s modifier: single line. Dot matches newline characters i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z]) m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

Collectives™ on Stack Overflow

Regular expression to match ID's and classes in CSS page

3 Answers 3

2 Comments

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related