How can I convert web page with javascript to plain html?

Question

I want to convert some web pages with javascript to plain html, and I found there several ways(pls tell me if I'm wrong):

Use Jython, an example: http://blog.databigbang.com/web-scraping-ajax-and-javascript-sites/
Use Java together with htmlunit
Use a proxy, an example: http://grep.codeconsult.ch/2007/02/24/crowbar-scrape-javascript-generated-pages-via-gecko-and-rest/
Use python together with qt or PyV8

Because I want to make a tiny tool to meet my request, and I thought it somewhat complicated to install V8 and qt, although python is my first choice.

So I tried to make a proxy with gecko, but it seems need a DISPLAY which I can not afford in a remote Linux server.

Now I am trying to use jython, but it seems there is no simple way to just convert a whole page to plain html.

Actually, I want to ask is there a way to convert a web page contains javascript to plain html, just like the brower does. Can node.js do this job?

Render it with Selenium/Ghost.py and dump the DOM into an HTML file. — Blender
– Blender, Commented Oct 21, 2013 at 3:18
yeah, that... do you want to remove all javascript from a page? that can be done easily with a regular expression... — Nicolas Straub
– Nicolas Straub, Commented Oct 21, 2013 at 3:19
@JoshuaSmock Just trying to get the content generated by javascript — WKPlus
– WKPlus, Commented Oct 21, 2013 at 6:57
@NicolásStraubValdivieso I am trying to extract the content generated by js, so can not just remove them. — WKPlus
– WKPlus, Commented Oct 21, 2013 at 6:59

Brad · Accepted Answer · 2013-10-21 03:52:47Z

2

I've recently built a server on top of PhantomJS that does this. I highly recommend this route.

http://phantomjs.org/

Basically, you write a quick script that has PhantomJS run the page, and configure a trigger method that lets you know the page is finished and sends the data off. My version used the built-in HTTP server, so PhantomJS easily served up the results on its own. This takes about 15 lines of code to do. (Sorry, can't paste it here... wrote it on work time. But, check out the example on their home page. It's almost complete!)

answered Oct 21, 2013 at 3:52

Brad

164k57 gold badges380 silver badges559 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

WKPlus Over a year ago

Thanks, phantomjs resovles my problem.

dreamflasher Over a year ago

Any chances of bringing phantomjs.org online again?

Collectives™ on Stack Overflow

How can I convert web page with javascript to plain html?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related