3

Lets say I'm using urllib2 and cookiejar (like so) to get responses from websites. Now I'm looking for an easy way to use jQuery to essentially scrape data from the response returned from the webserver.

I understand that there are other modules that can be used in python for web-scraping (like), but is it possibly with just jQuery commands? I'm assuming I'd need some sort of js parser within python?

The reason that I am wanting to use jQuery is that I have ~20 Greasemonkey scripts(mostly written by others) that do some interesting modifications to numerous web sites and web games. They do all of the DOM modifications with jQuery. Instead of completely refactoring most of this working and dependable code, I'd like to be able to simply port it to python (enabling simple and effective automation).

2 Answers 2

7

pyquery is suited perfectly for this task.

It allows you to use jQuery like selectors on (X)HTML/XML from Python.

For example:

>>> from pyquery import PyQuery as pq
>>> d = pq("<html><p id="hello">Foo</p></html>")

>>> d("#hello")
[<p#hello.hello>]

>>> d('p:first')
[<p#hello.hello>]

See the complete API documentation for details, and the project page on bitbucket for the source and issue tracker.

Sign up to request clarification or add additional context in comments.

1 Comment

thanks for this... Didn't know it existed! Going to have to contribute to this to keep it up-to-date (lots of simple issues are reported with no one contributing...)
2

Use lxml to parse the HTML and use it's cssselect module:

from lxml.cssselect import CSSSelector
from lxml import etree

tree = etree.parse(document)
elements = CSSSelector('div.content')(tree)

3 Comments

thanks for this generic xml parsing example. Definitely will be useful in the future.
@g19fanatic BTW, pyquery not only works on HTML/XML passed in as string, but also on element trees created by lxml. So the two work together nicely.
@LukasGraf, Thanks for the additional info! After looking through some of the documentation, I noticed that pyquery essentially IS using the cssselect module but with additional jquery-ish syntax :) Definitely an easy transistion from the greasemonkey scripts.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.