I'm having trouble parsing JS using lxml in Python. When I execute the code below, my output is:
"< Element div at 0x10cec4e10 >"
from lxml.html.clean import Cleaner
cleaner = Cleaner()
cleaner.javascript = True
text = urllib2.urlopen("URL").read().decode("utf-8")
test = lxml.html.fromstring(cleaner.clean_html(text))
print test
What I'm trying to get is the parsed text without the js stuff. Can someone shed some light? Thanks.