2

I am using phantomjs to retrieve CSS information from a page without execute its javascript. For example here is the code snippet.

page.settings.javascriptEnabled = false;
page.open('file:///home/sample.html', function(status) {
    if (status !== 'success') {
        console.log('Unable to access network');
    } else {
        page.includeJs("file:///home/sample.js", function() {
            var class = page.evaluate(function() {
                return document.querySelector('body').className;
            });
            console.log(class);
        });
    }
}

If I disabled the javascript, the evaluate function always return null. But when I tried to enable the javascript, the evaluate function will return some value. Is there any idea to disable the javascript in the page, but my included javascript have to work ?

2
  • What do you mean by your last sentence? You want to run some javascript but not all? Commented Jun 7, 2015 at 11:20
  • @WhatisSober the page I opened contains javascript code. I don't want that code to be executed. But then I include some script to help me to retrieve the information I want. So i need my included javascript from phantomjs to work. Commented Jun 7, 2015 at 12:00

2 Answers 2

4

No

page.evaluate() executes JavaScript on the page. If you disable JavaScript in PhantomJS, then you effectively can't use page.evaluate() anymore. And with it goes every way of accessing DOM elements. page.includeJs() will also not work, because it the script cannot be executed on the page.

You can still access page.content which provides access to the current page source (computed source). You may try to use some DOM library to parse the source into a DOM object1 or if the task is simple, you may try to use Regular Expressions.

1 Note that PhantomJS and node.js have different execution environments, so most node.js modules that deal with the DOM won't work

Sign up to request clarification or add additional context in comments.

3 Comments

What about removing all <script> tags from the document before either has been loaded. Is that possible? That would effectively disable JavaScript.
@Gajus If a script tag is in the DOM then it is most likely already executed. There is a very short window where the initial <script> is delivered to the browser and the remaining content of the script is still in transit. This is true for script elements directly contain JavaScript. If a script element has a src attribute, then an additional JavaScript file will be downloaded, which you can abort(), but it will be tricky. I wouldn't say that it disables JavaScript. It's just preventing it to be executed.
Thank you for the suggestion. I have added an answer suggesting to use HTTP proxy to achieve the above. stackoverflow.com/a/41893648/368691
1

As suggested by Artjom, there is no way to disable execution of the target website JavaScript without disabling PhantomJS ability to execute JavaScript on the page. However, there is a simple way to ensure that no scripts are executed by the target website (which achieves the same result, at the end).

  1. Create a HTTP proxy that intercepts all requests.
  2. Detect responses with Content-Type: text/html.
  3. Remove all <script> tags from the document.

You can configure to use proxy using --proxy configuration.

Use http-proxy to create a proxy server.

Use cheerio to remove, comment out, or otherwise invalidate the <script> tags.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.