5

I am trying to parse review from this page: http://www.amazon.co.uk/product-reviews/B00143ZBHY

Using following approach:

Code

html # a variable which contains exact html as given at the above page.
from lxml import etree
tree = etree.HTML(html)
r = tree.xpath(".//*[@id='productReviews']/tbody/tr/td[1]/div[9]/text()[4]")
print len(r)
print r[0].tag

Output

0
Traceback (most recent call last):
  File "c.py", line 37, in <module>
    print r[0].tag
IndexError: list index out of range

p,s,: While using the same xpath on xpath checker addon of firefox I am able todo it easily. But no result here, please help!

1
  • dont know why chrome showed tbody in xpath :( Commented Jul 12, 2012 at 19:24

1 Answer 1

7

Try to remove /tbody form XPath — there is no <tbody> in #productReviews.

import urllib2
html = urllib2.urlopen("http://www.amazon.co.uk/product-reviews/B00143ZBHY").read()
from lxml import etree
tree = etree.HTML(html)
r = tree.xpath(".//*[@id='productReviews']/tr/td[1]/div[9]/text()[4]")
print r[0]

Output:

bought this as replacement for the original cover which came with my greenhouse and which ripped in the wind.  so far this seems a good replacement although for some reason it seems slightly too small for my greenhouse so that i cant zip both sides of the front at the same time.  seems sturdier and thicker than the cover i had before so hoping it lasts a bit longer!
Sign up to request clarification or add additional context in comments.

2 Comments

I can accept the answer only after 15 mins from posting the question, wait I would do that in 3 mins
@dragosrsupercool It's not a silly mistake, read here: stackoverflow.com/a/5586627/1167879

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.