1

I expect the following code to return the text "In Stock" or "Out of Stock" (to check stock at an online store) but it returns only "[]". The XPath code was obtained from a browser's element inspector, and seems to be valid. I read online about namespaces possibly being the problem. Tips?

from lxml import html
import requests

url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/logitech-m310-wireless-mouse/p/2618659'
path = '//*[@id="content"]/section/section/div/font/div[7]/div/div[1]/div[2]/ul/li[1]/div/text()'

page = requests.get(url)
tree = html.fromstring(page.content)
stock = tree.xpath(path)
print(stock)

EDIT: Solution based on Padraic Cunningham's post.

Still not the most elegant due to its reliance on some absolute paths but at least this is working:

from lxml import html
import requests
import re

# in stock example URL
#url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/logitech-m310-wireless-mouse/p/2618659'

# out of stock example URL
url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/microsoft-basic-optical-mouse/p/108029878'

path = '//ul[@class="availability"]/li[./div[1]]'
inner_path = './div[1]/text()'

page = requests.get(url)
tree = html.fromstring(page.content)
stock = tree.xpath(path)
current = stock[0].xpath(inner_path)

print(current[0])
if re.search(r'in.*stock.*online', current[0], flags=re.IGNORECASE):
    print "Success!"
else:
    print "Keep waiting..."

1 Answer 1

1

You xpath is wrong:

 from lxml import html
import requests

url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/logitech-m310-wireless-mouse/p/2618659'
path = '//ul[@class="availability"]/li[./div[@class="availability-text in-stock"]]'

page = requests.get(url)
tree = html.fromstring(page.content)

stock = tree.xpath(path)
current = stock[0].xpath('./div[@class="availability-text in-stock"]/text()')
print(current[0])
for node in stock[1:]:
    print(node.xpath('./div[@class="availability-text in-stock"]/a/@aria-label'))

Which gives you:

  In Stock Online
In Stock   YORKDALE  MALL
In Stock   LAWRENCE SQUARE

The availability is in the unordered list with the availability class, our path xpath pulls all the li children that have a div with a availability-text in-stock class, inside all the divs bar the first which is there is an anchor like:

            <a class="underline"
            aria-label="In Stock &nbsp; YORKDALE  MALL"
            title="View Store Details"
            href="#product-store-availability">
                YORKDALE  MALL</a>

You can see the aria label contains the availability and the store.

If you want to break up into availability and the store, you can split on the &nbsp:

print(node.xpath('./div[@class="availability-text in-stock"]/a/@aria-label')[0].split("\xa0"))

Which would give:

['In Stock ', ' YORKDALE  MALL']
['In Stock ', ' LAWRENCE SQUARE']

Your browser tools are essential when it comes to scraping, just don't rely on what they give you as an xpath/select when you right click and choose copy xpath/selector, have a look at the source and try to find ids or class names that are associated with what you are trying to parse.

If you only want the first, you can still be specific with your xpath:

url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/logitech-m310-wireless-mouse/p/2618659'
path = '(//ul[@class="availability"]/li/div[@class="availability-text in-stock"])[1]/text()'

page = requests.get(url)
tree = html.fromstring(page.content)
stock = tree.xpath(path)
success = {"in","stock"}

if stock and all(w in success for w in stock[0].lower().split()):
    print("Success")
else:
    print("Failure")
Sign up to request clarification or add additional context in comments.

2 Comments

The availability-text in-stock class is only if the product is in stock. For example, in an out of stock case (url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/microsoft-basic-optical-mouse/p/108029878') the solution breaks as the class becomes availability-text out-of-stock. I had thought my xpath was an absolute path that would avoid this issue.
If you don't find anything that would mean it was out of stock then no? I will have a look when I get back on my notebook, regardless using an path like that in your question is very brittle even if it had worked

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.