PHP: Get all CSS files of an HTML web page

Question

I'm trying to get all CSS files of an html file from URL.

I know that if I want to get the HTML code it is easy - just using PHP function - file_get_contents.

The question is - if I could search easily inside an a URL of HTML and get from there the files or content of all related CSS files?

Note - I want to build an engine for getting a lot of CSS files, this is why just reading the source is not enough..

Thanks,

Are you trying to use PHP to retrieve a page, then parse the page to get a list of CSS files? If so, how does javacript factor into that? (you tagged your question with javascript) — Chris Baker
– Chris Baker, Commented Sep 11, 2013 at 17:55
You'll probably need to load the response from the HTML into a DOM parser and start looking for link elements of type text/css, extracting the URL from them, and making new file_get_contents requests for each of them. Beyond that, you'll also need to parse out embedded style tags and inline style attributes throughout the HTML. — David
– David, Commented Sep 11, 2013 at 17:56
Josh - I updated my question. of course reading the source is easy but I need it for thousands of websites. — Doron Cohen
– Doron Cohen, Commented Sep 11, 2013 at 18:00

xsist10 · Accepted Answer · 2013-09-11 18:01:24Z

9

You could try using http://simplehtmldom.sourceforge.net/ for HTML parsing.

require_once 'SimpleHtmlDom/simple_html_dom.php';

$url = 'www.website-to-scan.com';
$website = file_get_html($url);

// You might need to tweak the selector based on the website you are scanning
// Example: some websites don't set the rel attribute
// others might use less instead of css
//
// Some other options:
// link[href] - Any link with a href attribute (might get favicons and other resources but should catch all the css files)
// link[href="*.css*"] - Might miss files that aren't .css extension but return valid css (e.g.: .less, .php, etc)
// link[type="text/css"] - Might miss stylesheets without this attribute set
foreach ($website->find('link[rel="stylesheet"]') as $stylesheet)
{
    $stylesheet_url = $stylesheet->href;

    // Do something with the URL
}

answered Sep 11, 2013 at 18:01

xsist10

3,0611 gold badge27 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Kelu Thatsall · Accepted Answer · 2013-09-11 18:01:24Z

1

You need to parse the HTML tags looking for CSS files. You can do it for example with preg_match - looking for matching regex.

Regex which would find such files might be like this:

\<link .+href="\..+css.+"\>

answered Sep 11, 2013 at 18:01

Kelu Thatsall

2,5512 gold badges24 silver badges50 bronze badges

10 Comments

Chris Baker Over a year ago

Using regex to parse HTML is a recipe for disaster. -1

Kelu Thatsall Over a year ago

It was a fast comment, didn't think too much about it and I said it's just an example how you can do it. You are right that's not the best idea, but for simple purposes it's just fine. Imo it's overkill to use simpleHtml if you just need to find something simple.

Chris Baker Over a year ago

No, it is always a bad idea. HTML is not a regular language, so if you want consistent results (generally what a programmer is shooting for) then you should use the appropriate tools. I agree simpleHTML isn't necessary, since PHP has DomDocument without adding a third party library. However, I don't agree with your sentiment that using bad practices for "simple purposes" is okay. If you want reliable code, you should do it the right way every time.

Kelu Thatsall Over a year ago

Well I agree with you 100%. My answer here is then wrong, but what I meant by simple purposes is stuff like parsing only one site, which structure you know and you know it doesn't change. For random pages... it's a bad idea, true.

Robert Over a year ago

I do not agree @Chris Baker. It is still a well defined language and matching a css include is quite simpel and should be favoured over using a dom parser. Even a DOM parser could be wrong when HTML is not valid and then a regex perfoms mostly better. Of course i would invest some time and improve the regex to be a little bit fault tolerant but i would go with that solution.

|

Collectives™ on Stack Overflow

PHP: Get all CSS files of an HTML web page

2 Answers 2

Comments

10 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

10 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related