0

I am trying to extract some text and links from instapaper.com. So I am using the following code to get the job done:

>>> import lxml.html as lh
>>> doc = lh.parse("http://www.instapaper.com/u/folder/1227370/programming")
>>> text = doc.xpath(".//*[@id='bookmark_list']/*/div[3]/a/text()")
>>> len(text)
0
>>> text
[]

As you can see it returns an empty list which means that it is not able to find any text matching the above xpath .

Now when I use the above xpath expr in firebug/firepath it works fine.

enter image description here

You can see in the above image it shows 40 matching nodes.

So, my question is why the above xpath expression is not working with python/lxml.

As requested Instapaper page source

1
  • Try removing the first period character. Commented Aug 6, 2012 at 10:13

1 Answer 1

5

There is no element with the ID bookmark_list. Maybe you must be logged in.

Edit

Parsing the real HTML it works:

doc = lh.parse("http://pastebin.com/raw.php?i=1WpFAfCt")
text = doc.xpath("//*[@id='bookmark_list']/*/div[3]/a/text()")
len(text) # => 40
Sign up to request clarification or add additional context in comments.

1 Comment

Nice catch. Yes, I am logged in.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.