view actual html source of javascript generated html page

Question

Lets say I have a bit of javascript code that is passed a string from php containing an entire html page. I write the string to the current document and then alter one of it's containing elements. Something like this:

<script type="text/javascript">
var foo = <?php echo $html_document;?>;
document.open();
document.write(foo);
document.close();
document.getElementById("some_id_within_html_document").innerHTML = "some stuff";
</script>

This gives me my desired output, everything looks great... except when you view the source of this page. If i wanted to scrape this page later and do the same thing it displays the javascript instead of the html interpreted by the browser. Using this method how could I scrape the desired HTML instead of the javascript generating it? I have already circumvented this issue by processing the string in php instead however I am still curious if it is possible to display the interpreted HTML this way when viewing the source/scraping the page.

Edit: Great responses across the board, I learned a lot about what is actually going on here and what practices I should stay away from. The simplest solution that would take the least effort in relation to my original problem was given by Justin Wood.

You realise that's an oxmoron? If the page is generated by script, it has no source markup. However, the innerHTML property is supposed to be a markup equivalent based on the HTML fragment serialisation algorithm. Note that serialising a document fragment, then turning the result back into a fragment with an HTML parser may not produce excatly the same result as the original. — RobG
– RobG, Commented Oct 3, 2012 at 0:55

rsp · Accepted Answer · 2012-10-03 01:05:21Z

6

Not exactly sure what you are trying to do but you can see the HTML equivalent to the generated/modified DOM using something like:

document.documentElement.innerHTML

or:

document.getElementById("some_id").innerHTML

See DEMO.

You can create a bookmarklet that includes this code:

alert(document.documentElement.innerHTML);

to see the HTML of the DOM modified by JavaScript on every page that you view.

Update:

If you want to do some Web scraping on your server where you want to download some external Web page, execute its JavaScript and then see the HTML that corresponds to the DOM after the JavaScript is executed (with the document.write calls and all that) then try using Zombie or Phantom. See also Mink for a PHP tool that supports Zombie.

Generally search for a headless browser with JavaScript engine.

Contrary to what people write in other answers here, it is actually possible.

edited Oct 3, 2012 at 1:05

answered Oct 3, 2012 at 0:28

rsp

112k31 gold badges210 silver badges185 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jeffrey Benjamin Brown Over a year ago

When I try this I get the code that generates HTML (between <script> brackets in the <head> of the document); I don't get the HTML that it would generate.

Justin Wood · Accepted Answer · 2012-10-03 00:13:04Z

1

don't pass your PHP variable into the javascript. Just output the variable itself, then use javascript to edit whatever it is that you want to edit...

<?php
$html = "<html><head><title></title></head><body><p id='p'>Something</p></body></html>";

echo $html;
?>

<script type="text/javascript">
  document.getElementById("p").innerHTML = "blah";
</script>

Something like that should work for you.

NOTE: I have only tested this in chrome, FF, and safari

answered Oct 3, 2012 at 0:13

Justin Wood

10.1k2 gold badges35 silver badges48 bronze badges

Comments

deceze · Accepted Answer · 2012-10-03 00:08:02Z

0

You don't. The HTML is not in the source, period. The original HTML contains Javascript that needs to be executed. That Javascript manipulates the DOM of the page to add more things to it. The original HTML doesn't change, it still has only the Javascript.

If you want to "scrape" Javascript-generated content, you always need to parse and execute the whole page including Javascript and a DOM and evaluate the resulting changed DOM.

answered Oct 3, 2012 at 0:08

deceze♦

525k89 gold badges806 silver badges954 bronze badges

1 Comment

Will Sampson Over a year ago

Curious, I am running php with the CodeIgniter framework and I am sure there is a way to do this. I'll look into it thanks!

JCOC611 · Accepted Answer · 2012-10-03 00:08:23Z

-1

Since JavaScript is a client-sided language, it doesn't get executed when you view the source of a page, and thus the discrepancy between the visual result and the source. You would have to replace the JS with PHP or another server-sided language to achieve the same result.

Moreover, if you still wanted to use JavaScript, then you would have to view the DOM, or document object, which holds all the HTML nodes, after the JavaScript had been executed. One way to do this is using the inspector in Chrome (CTRT + SHIFT + I) or (Right Click -> Inspect this element).

answered Oct 3, 2012 at 0:08

JCOC611

19.8k15 gold badges71 silver badges90 bronze badges

Comments

Ali Kayn · Accepted Answer · 2022-10-29 08:51:10Z

-2

Stepping aside from the Javascript reference, are you really trying to "view source", which used to be a simple option in browsers? A vanilla look that helps find typos etc?

In Chrome that is Ctl-U. Not a menu option anymore, but working 2022-10-29.

answered Oct 29, 2022 at 8:51

Ali Kayn

11 bronze badge

2 Comments

A. Khaled Over a year ago

He want to get the code programmatically

Aaron Meese Over a year ago

This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From Review

Collectives™ on Stack Overflow

view actual html source of javascript generated html page

5 Answers 5

Update:

1 Comment

Comments

1 Comment

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Update:

1 Comment

Comments

1 Comment

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related