Not able to parse html using lxml Xpath parser

Question

I am trying to parse review from this page: http://www.amazon.co.uk/product-reviews/B00143ZBHY

Using following approach:

Code

html # a variable which contains exact html as given at the above page.
from lxml import etree
tree = etree.HTML(html)
r = tree.xpath(".//*[@id='productReviews']/tbody/tr/td[1]/div[9]/text()[4]")
print len(r)
print r[0].tag

Output

0
Traceback (most recent call last):
  File "c.py", line 37, in <module>
    print r[0].tag
IndexError: list index out of range

p,s,: While using the same xpath on xpath checker addon of firefox I am able todo it easily. But no result here, please help!

dont know why chrome showed tbody in xpath :(

codersofthedark
– codersofthedark

2012-07-12 19:24:24 +00:00
Commented Jul 12, 2012 at 19:24 — codersofthedark
– codersofthedark, Commented Jul 12, 2012 at 19:24

fedosov · Accepted Answer · 2012-07-12 19:14:24Z

7

Try to remove /tbody form XPath — there is no <tbody> in #productReviews.

import urllib2
html = urllib2.urlopen("http://www.amazon.co.uk/product-reviews/B00143ZBHY").read()
from lxml import etree
tree = etree.HTML(html)
r = tree.xpath(".//*[@id='productReviews']/tr/td[1]/div[9]/text()[4]")
print r[0]

Output:

bought this as replacement for the original cover which came with my greenhouse and which ripped in the wind.  so far this seems a good replacement although for some reason it seems slightly too small for my greenhouse so that i cant zip both sides of the front at the same time.  seems sturdier and thicker than the cover i had before so hoping it lasts a bit longer!

answered Jul 12, 2012 at 19:14

fedosov

2,05916 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

codersofthedark Over a year ago

I can accept the answer only after 15 mins from posting the question, wait I would do that in 3 mins

Alex Okrushko Over a year ago

@dragosrsupercool It's not a silly mistake, read here: stackoverflow.com/a/5586627/1167879

Collectives™ on Stack Overflow

Not able to parse html using lxml Xpath parser

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related