I'm using PhantomJS as a crawler; if there is no JS in a page I can assume that it's completely loaded when onLoadFinished fires, but if there is JS in a page, I need to wait a bit to give the scripts a chance to do stuff. This is my current stab at detecting JS:
var pageHasJS = page.evaluate(function () {
return (document.getElementsByTagName("script").length > 0 ||
document.evaluate("count(//@*[starts-with(name(), 'on')])",
document, null, XPathResult.NUMBER_TYPE,
null).numberValue > 0);
})
This looks for <script> tags and for elements with an onsomething attribute.
Q1: Is there any other HTML construct that can sneak JS into a page? javascript: URLs do not count, because nothing will ever get clicked.
Q2: Is there a better way to do the second test? I believe it is not possible to do that with querySelector, hence resorting to XPath, but maybe there is some other feature that would accomplish the same task.
Q3: The crawler does not interact with the page once it is loaded. The onload event is the only legacy event attribute that I know of that fires in the absence of user interaction. Are there any others? In other words, would it be safe to replace the second test with document.evaluate("count(//@onload)", ...) or maybe even !!document.body.getAttribute("onload")?
onunloadattribute, but this should not concern you.