I want to convert some web pages with javascript to plain html, and I found there several ways(pls tell me if I'm wrong):
- Use Jython, an example: http://blog.databigbang.com/web-scraping-ajax-and-javascript-sites/
- Use Java together with htmlunit
- Use a proxy, an example: http://grep.codeconsult.ch/2007/02/24/crowbar-scrape-javascript-generated-pages-via-gecko-and-rest/
- Use python together with qt or PyV8
Because I want to make a tiny tool to meet my request, and I thought it somewhat complicated to install V8 and qt, although python is my first choice.
So I tried to make a proxy with gecko, but it seems need a DISPLAY which I can not afford in a remote Linux server.
Now I am trying to use jython, but it seems there is no simple way to just convert a whole page to plain html.
Actually, I want to ask is there a way to convert a web page contains javascript to plain html, just like the brower does. Can node.js do this job?