XPath returning empty list (namespace issue?)

Question

I expect the following code to return the text "In Stock" or "Out of Stock" (to check stock at an online store) but it returns only "[]". The XPath code was obtained from a browser's element inspector, and seems to be valid. I read online about namespaces possibly being the problem. Tips?

from lxml import html
import requests

url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/logitech-m310-wireless-mouse/p/2618659'
path = '//*[@id="content"]/section/section/div/font/div[7]/div/div[1]/div[2]/ul/li[1]/div/text()'

page = requests.get(url)
tree = html.fromstring(page.content)
stock = tree.xpath(path)
print(stock)

EDIT: Solution based on Padraic Cunningham's post.

Still not the most elegant due to its reliance on some absolute paths but at least this is working:

from lxml import html
import requests
import re

# in stock example URL
#url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/logitech-m310-wireless-mouse/p/2618659'

# out of stock example URL
url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/microsoft-basic-optical-mouse/p/108029878'

path = '//ul[@class="availability"]/li[./div[1]]'
inner_path = './div[1]/text()'

page = requests.get(url)
tree = html.fromstring(page.content)
stock = tree.xpath(path)
current = stock[0].xpath(inner_path)

print(current[0])
if re.search(r'in.*stock.*online', current[0], flags=re.IGNORECASE):
    print "Success!"
else:
    print "Keep waiting..."

Padraic Cunningham · Accepted Answer · 2016-06-25 21:51:22Z

1

You xpath is wrong:

 from lxml import html
import requests

url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/logitech-m310-wireless-mouse/p/2618659'
path = '//ul[@class="availability"]/li[./div[@class="availability-text in-stock"]]'

page = requests.get(url)
tree = html.fromstring(page.content)

stock = tree.xpath(path)
current = stock[0].xpath('./div[@class="availability-text in-stock"]/text()')
print(current[0])
for node in stock[1:]:
    print(node.xpath('./div[@class="availability-text in-stock"]/a/@aria-label'))

Which gives you:

  In Stock Online
In Stock   YORKDALE  MALL
In Stock   LAWRENCE SQUARE

The availability is in the unordered list with the availability class, our path xpath pulls all the li children that have a div with a availability-text in-stock class, inside all the divs bar the first which is there is an anchor like:

            <a class="underline"
            aria-label="In Stock &nbsp; YORKDALE  MALL"
            title="View Store Details"
            href="#product-store-availability">
                YORKDALE  MALL</a>

You can see the aria label contains the availability and the store.

If you want to break up into availability and the store, you can split on the &nbsp:

print(node.xpath('./div[@class="availability-text in-stock"]/a/@aria-label')[0].split("\xa0"))

Which would give:

['In Stock ', ' YORKDALE  MALL']
['In Stock ', ' LAWRENCE SQUARE']

Your browser tools are essential when it comes to scraping, just don't rely on what they give you as an xpath/select when you right click and choose copy xpath/selector, have a look at the source and try to find ids or class names that are associated with what you are trying to parse.

If you only want the first, you can still be specific with your xpath:

url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/logitech-m310-wireless-mouse/p/2618659'
path = '(//ul[@class="availability"]/li/div[@class="availability-text in-stock"])[1]/text()'

page = requests.get(url)
tree = html.fromstring(page.content)
stock = tree.xpath(path)
success = {"in","stock"}

if stock and all(w in success for w in stock[0].lower().split()):
    print("Success")
else:
    print("Failure")

edited Jun 25, 2016 at 21:51

answered Jun 25, 2016 at 18:20

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

adatum Over a year ago

The availability-text in-stock class is only if the product is in stock. For example, in an out of stock case (

url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/microsoft-basic-optical-mouse/p/108029878'

) the solution breaks as the class becomes availability-text out-of-stock. I had thought my xpath was an absolute path that would avoid this issue.

Padraic Cunningham Over a year ago

If you don't find anything that would mean it was out of stock then no? I will have a look when I get back on my notebook, regardless using an path like that in your question is very brittle even if it had worked

Collectives™ on Stack Overflow

XPath returning empty list (namespace issue?)

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related