4

Is it possible to get the source of the current HTML document, exactly as it was loaded, in text form? (i.e. not the "Generated source" after parsing and DOM manipulation.)

Note: Issuing an extra AJAX request to retrieve the HTML page again is not an option in this case: The document could have changed.

Most browsers have a "view source" functionality, which would provide exactly what I want - so browsers keep the original HTML content anyway. Would be nice, if I could access that...

4
  • Is grabbing it on window.onload an option? Commented Jun 13, 2010 at 14:04
  • 2
    @Gert - Even then it could be heavily modified, at the very least, all the scripts in the <head> already ran. Commented Jun 13, 2010 at 14:06
  • @Gert It must be character-by-character the same as the document that was downloaded. Commented Jun 13, 2010 at 14:08
  • Good point Nick. Then there's no way of what I can see. Commented Jun 13, 2010 at 14:10

3 Answers 3

4

You can't do this with JavaScript, the browser has no responsibility to keep the original document really. Is making an AJAX request with a timestamp an option? You could store the loaded date of the page with new Date() and pass this timestamp to the server when asking for the document again, if a history was available.

Other than that...I'm not sure how you'd do this with JavaScript/HTML. What is your actual end-game goal here though? Are you seeing if a <form> and it's inputs changed, or something else?

Sign up to request clarification or add additional context in comments.

6 Comments

I'm thinking about possibilities to make it harder for an attacker to modify the page (by manipulating HTTP traffic). I would build its MD5 sum, and let that check by a JavaScript that was loaded via HTTPS. It's just a rough idea -- I really don't know, if I can make that work... (The page itself has to be loaded via HTTP due to SOP issues, but it's possible to include HTTPS scripts in such a page!)
@chris_l - With http/https you'll possibly have some cross-domain issues...I'd really press that SOP getting changed. My current employer is ISO certified, we're under the same restrictions, but getting them changed for the overall good is worth it every time. You can put a hash in the page that verifies against something on the server, an IP/session variable changing every load, etc...but none of that prevents man-in-the-middle attacks really. HTTPS/SSL is definitely you're best option, if you're able to push for that SOP getting changed at all.
@Nick: Oh, I meant "Same Origin Policy", not "Standard Operating Procedure" - I just realized the ambiguity of that abbreviation... :-) I'm afraid, I can't change that: There will have to be images included from foreign HTTP pages (and I can't copy them to my own server).
@chris_l - Images are fine, as long as the scripts/page itself are from the same scheme/domain, thats all that will be affected, try your page and scripts over HTTPS, images with HTTP if necessary, shouldn't be an issue for same-origin.
@Nick: Having HTTP images on an HTTPS page requires users to click away a message box like "This page contains insecure elements - do you want to show these elements Yes/No" - can't do that (and I also don't want to train people to click away warnings)...
|
3

As far as I know of, there is no way of doing so.

You may try grab the HTML very early and store it in a variable, but that's a very poor alternative because:

  • if very early is too early (before all DOM nodes are loaded), you'll run into trouble trying to get the innerHTML property
  • if very early is when the DOM is ready for manipulation, it might be too late already (if you have things like <script>document.write(stuff);</script> you may already seeing a different view over the HTML content)

Re-fetching the document with AJAX, despite its many possible implications, may be your best alternative regarding this matter.

Comments

1

A very bad hack-around method would be to load the page only using JS. Load a blank page with a single AJAX call to get the actual content of the page.

However, before doing that, I'd rethink what you are trying to do and why you need the "saved state."

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.