2

I am writing a chrome extension and I am trying to detect all images in a webpage.

I am trying in my JS code to detect all images on a webpage, and by all I mean:

  1. Images that are loaded once the webpage is loaded
  2. Images that are used as background (either in the CSS or inline html)
  3. Images that could be loaded after the webpage is done loading, for instance, when doing a google image search it is easy to find all images, but once you click on one image to make it bigger, this image is not detected. Same thing for browsing social media website.

The code that I have right now makes it easy to find the initial images (1). But I struggle with the other two parts (2) and (3).

Here is my current code in contentScript.js:

var images = document.getElementsByTagName('img');
for (var i = 0, l = images.length; i < l; i++) {
    //Do something
}

How should I modify it so that it actually can detect all other images (2 and 3).

I have seen a couple of questions on (2) on SO like this one or this one, but none of the answers seem to completely satisfy my second requirement and none of them is about the third.

4
  • Regarding Point 3. Couldn't you just do a setInterval() and check if any new images are in the DOM? Commented Sep 21, 2018 at 13:31
  • @filip seems quite computationally heavy (especially if you want new images to be detected right away, which is one requirement I have). I was thinking more of something like catching events. Isn't there any event that I could use to know that something has been added to the DOM and just check what that something contains to see if there is an image? Commented Sep 21, 2018 at 13:33
  • 1
    Found something called MutationObserver, which checks for changes in the DOM (for example adding an <img> tag) developer.mozilla.org/en-US/docs/Web/API/MutationObserver @LBes Commented Sep 21, 2018 at 13:38
  • 1
    Interesting @filip will give this a try when I'm back from work Commented Sep 21, 2018 at 13:41

3 Answers 3

5
+50

Live collection of imgs

To find all HTML images, as @vsync has said, is as simple as var images = document.images. This will be a live list so any images that are dynamically added or removed from the page will be automatically reflected in the list.

Extracting background images (inline and CSS)

There are a few ways to check for background images, but perhaps the most reliable way is to iterate over all the page's elements and use window.getComputedStyle to check if each element's backgroundImage does not equal none. This will get background images set both inline and in CSS.

var images = [];
var elements = document.body.getElementsByTagName("*");
Array.prototype.forEach.call( elements, function ( el ) {
    var style = window.getComputedStyle( el, false );
    if ( style.backgroundImage != "none" ) {
        images.push( style.backgroundImage.slice( 4, -1 ).replace(/['"]/g, "")
    }
}

Getting the background image from window.getComputedStyle will return the full CSS background-image property, in the form url(...) so you will need to remove the url( and ). You'll also need to remove any " or ' surrounding the URL. You might accomplish this using backgroundImage.slice( 4, -1 ).replace(/['"]/g, "")

Only start checking once the DOM is ready, otherwise your initial scan might miss elements.

Dynamically added background images

This will not provide a live list, so you will need a MutationObserver to watch the document, and check any changed elements for the presence of backgroundImage.

When configuring your observer, make sure your MutationObserver config has childList and subtree set to true. This means it can watch all children of the specified element (in your case the body).

var body = document.body;
var callback = function( mutationsList, observer ){
    for( var mutation of mutationsList ) {
        if ( mutation.type == 'childList' ) {
            // all changed children are in mutation.target.children
            // so iterate over them as in the code sample above
        }
    }
}
var observer = new MutationObserver( callback );
var config = { characterData: true,
            attributes: false,
            childList: true,
            subtree: true };
observer.observe( body, config );

Since searching for background images requires you to check every element in the DOM, you might as well check for <img>s at the same time, rather than using document.images.

Code

You would want to modify the code above so that, in addition to checking if it has a background image, you would check if its tag name is IMG. You should also put it in a function that runs when the DOM is ready.

UPDATE: To differentiate between images and background images, you could push them to different arrays, for example to images and bg_images. To also identify the parents of images, you would push the image.parentNode to a third array, eg image_parents.

var images = [],
    bg_images = [],
    image_parents = [];
document.addEventListener('DOMContentLoaded', function () {
    var body = document.body;
    var elements = document.body.getElementsByTagName("*");

    /* When the DOM is ready find all the images and background images
        initially loaded */
    Array.prototype.forEach.call( elements, function ( el ) {
        var style = window.getComputedStyle( el, false );
        if ( el.tagName === "IMG" ) {
            images.push( el.src ); // save image src
            image_parents.push( el.parentNode ); // save image parent

        } else if ( style.backgroundImage != "none" ) {
            bg_images.push( style.backgroundImage.slice( 4, -1 ).replace(/['"]/g, "") // save background image url
        }
    }

    /* MutationObserver callback to add images when the body changes */
    var callback = function( mutationsList, observer ){
        for( var mutation of mutationsList ) {
            if ( mutation.type == 'childList' ) {
                Array.prototype.forEach.call( mutation.target.children, function ( child ) {
                    var style = child.currentStyle || window.getComputedStyle(child, false);
                    if ( child.tagName === "IMG" ) {
                        images.push( child.src ); // save image src
                        image_parents.push( child.parentNode ); // save image parent
                    } else if ( style.backgroundImage != "none" ) {
                        bg_images.push( style.backgroundImage.slice( 4, -1 ).replace(/['"]/g, "") // save background image url
                    }
                } );
            }
        }
    }
    var observer = new MutationObserver( callback );
    var config = { characterData: true,
                attributes: false,
                childList: true,
                subtree: true };

    observer.observe( body, config );
});
Sign up to request clarification or add additional context in comments.

10 Comments

The first part of your answers states that "var iamges = document.images. This will be a live list so any images that are dynamically added or removed from the page will be automatically reflected in the list." I have that already and it doesn't work at all on newly loaded content (cf clicking an image on google image, or scrolling on social media)
@LBes interesting, the spec indicates it should be a live collection. document.images is quite an old standard, the HTML spec suggests getElementsByTagName("img") as another option. However using the MutationObserver solution and checking changed elements for IMG tag names as well as background images is likely to be the most robust and complete solution.
which is what the second answer suggested, but see my comments there.
@LBes The error is not of type 'Node' means that the element to which you're attaching the observer does not exist when your code runs. As suggested in my answer use document.addEventListener('DOMContentLoaded', function () { to wait till the DOM loads; if the element loads after the DOM see stackoverflow.com/questions/40398054/… for a solution that polls until the element is ready.
thanks for the update. Yes it was correctly set to true... I also don't understand why it wouldn't work. Accepting the answer anyway as it is a very specific case, but I'd still like to figure it out. If you have ideas, feel free to send them my way :)
|
2

For HTML images (which already exist by the time you run this):

document.images

For CSS images:

You would need to probably use REGEX on the page's CSS (either inline or external files), but this is tricky because you would need to dynamically build the full path out of the relative paths and that might not always work.

Getting all css used in html file


For delayed-loaded images:

You can use a mutation observer, like @filip has suggest in his answer

1 Comment

Ok thanks for the pointers. +1. I had this fear that it would not always work indeed.
1

This should solve your 3. problem. I used a MutationObserver.

I check the targetNode for changes and add a callback, if a change happens.

For your case the targetNode should be the root element to check changes in the whole document.

In the callback I ask if the mutation has added a Node or not with the "IMG" tag.

    const targetNode = document.getElementById("root");

    // Options for the observer (which mutations to observe)
    let config = { attributes: true, childList: true, subtree: true };

    // Callback function to execute when mutations are observed
    const callback = function(mutationsList, observer) {
        for(let mutation of mutationsList) {
            if (mutation.addedNodes[0].tagName==="IMG") {
                console.log("New Image added in DOM!");
            }   
        }
    };

    // Create an observer instance linked to the callback function
    const observer = new MutationObserver(callback);

    // Start observing the target node for configured mutations
    observer.observe(targetNode, config);

10 Comments

Ok seems like the way to go. Upvote! but I get a "Failed to execute 'observe' on 'MutationObserver': parameter 1 is not of type 'Node'." on line observer.observe(targetNode, config);
@LBes what did you set the targetNode var to? You can't just use document.body or something like that. I defined an id to the <html> tag and then got it by document.getElementByID("root");
I did use document.body. But then what do you suggest using, not sure I understand what you mean
@LBes HTML: <html id="root">....</html> JS: targetNode = document.getElementById("root");
that cannot work for me. It's a chrome extension, it has to work on every page :s
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.