4

I'm using lxml.html for some html parsing in python. I'd like to get a rough estimate of the location of elements within the page after it would be rendered by a browser. It does not have to be exact, but generally correct. For simplicity I will ignore the effects of Javascript on element location. As an end result, I would like to be able to iterate over the elements (e.g., via lxml) and find their x/y coordinates. Any thoughts on how to do this? I don't need to stay with lxml and am happy to try other libraries.

2
  • 4
    You will need a HTML rendering engine to get this information. A parser won't help. Commented Dec 3, 2010 at 11:56
  • You'll also need to consider the effect of CSS. Very little content is rendered without it, these days. Commented Dec 3, 2010 at 12:05

2 Answers 2

5

PyQt with webkit:

import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *

class MyWebView(QWebView):
    def __init__(self):
        QWebView.__init__(self)
        QObject.connect(self,SIGNAL('loadFinished(bool)'),self.showelements)

    def showelements(self):
        html=self.page().currentFrame().documentElement()
        for link in html.findAll('a'):
            print(link.toInnerXml(),str(link.geometry())[18:])


if __name__=='__main__':
    app = QApplication(sys.argv)

    web = MyWebView()
    web.load(QUrl("http://www.google.com"))
    web.show()

    sys.exit(app.exec_())
Sign up to request clarification or add additional context in comments.

1 Comment

This is fantastic. Is there a way to get this to be a little more command-line friendly, specifically quitting on its own (or operating on sequence of urls? I have removed 'web.show()' and placed a 'sys.exit(0)' at the end of show elements.
1

As stated by Sven, you need an HTML rendering engine. A question on rendering HTML was asked before, you could refer to that.

Python library for rendering HTML and javascript

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.