20

Is there any python module for rendering a HTML page with javascript and get back a DOM object?

I want to parse a page which generates almost all of its content using javascript.

2
  • 2
    could you retitle the question to something like "emulating a browser DOM in python" - the current one doesn't really reflect the question. Commented Sep 26, 2008 at 23:37
  • When you say "dom" object, do you mean a html string containing the HTML from which a DOM would be constructed? Commented Mar 14, 2024 at 9:44

2 Answers 2

9

The big complication here is emulating the full browser environment outside of a browser. You can use stand-alone javascript interpreters like Rhino and SpiderMonkey to run javascript code, but they don't provide a complete browser-like environment to fully render a web page.

If I needed to solve a problem like this I would first look at how the javascript is rendering the page, it's quite possible it's fetching data via AJAX and using that to render the page.

I could then use Python libraries like and to directly fetch the data and use that, negating the need to access the DOM object. However, that's only one possible situation, I don't know the exact problem you are solving.

Other options include the selenium one mentioned by @Łukasz, some kind of:

  • WebKit embedded craziness,
  • IE win32 scripting craziness or,
  • a pyxpcom-based solution (with added craziness) finally.

All these have the drawback of requiring pretty much a fully running web browser for Python to play with, which might not be an option depending on your environment.

Sign up to request clarification or add additional context in comments.

Comments

1

You can probably use python-webkit for it. Requires a running glib and GTK, but that's probably less problematic than wrapping the parts of webkit without glib.

I don't know if it does everything you need, but I guess you should give it a try.

1 Comment

I think pywebkitgtk can only render the html page. Is it possible to get the xml source after rendering it? There isn't enough docs on that

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.