2

I'm programming a ticker to get walmart out of stock and price changes... But I'm stuck: When I try to get the id of the item (ending number in the link) I can't parse it. Here is the code

# -*- coding: utf-8 -*-

import re
import urllib2

def walmart():
    fileprod = urllib2.urlopen("http://testh3x.altervista.org/walmart.txt").read()
    prods = fileprod.split("|")
    print prods
    lenp = len(prods)
    counter = 0
    while 1:
        while counter < lenp:
            data = urllib2.urlopen(prods[counter]).read()
            path = re.compile("class=\"Outofstock\"") #\s space - \w char - \W Tutto meno che char - 
            matching = path.match(data)
            if matching == None: 
                pass
            else:
                print "Out of stock"
            name = re.compile("\d") 
            m = name.match(str(prods[counter])).group #prods counter è il link
            print m


def main():
    walmart()

if __name__ == "__main__":
    main()

It throws:

  File "C:\Users\Leonardo\Desktop\BotDevelop\ticker.py", line 22, in walmart
    m = name.match(str(prods[counter])).group #prods counter è il link
AttributeError: 'NoneType' object has no attribute 'group'
2
  • You don't need to compile re every loop - you may do this before while. also, you could rewrite "class=\"Outofstock\"" with single outer quote 'class="Outofstock"', so you don't need to escape double quotes Commented Mar 15, 2014 at 12:23
  • Just as a comment, parsing html with regex ain't a very good idea: stackoverflow.com/questions/1732348/… Commented Mar 15, 2014 at 12:24

2 Answers 2

3

You should check into BeautifulSoup, which makes parsing html manageable and rather easy. Regexes won't usually do very well.

To answer your question, though, your error comes from the fact that no matches were found. In general, it is better to run a regex like this:

m = name.match(str(prods[counter]))  # if no match is found, then None is returned
if m:
    m = m.group()  # be sure to call the method here
Sign up to request clarification or add additional context in comments.

Comments

1

Your regular expression didn't match. You are using re.match() instead of re.search(); the former only matches at the start of a string:

m = name.search(str(prods[counter])).group()

You don't need to re-compile your regular expressions in the loop either; move those out of the loops and compile them just once.

You really should not be using regular expressions to parse HTML, when there are better tools available. Use BeautifulSoup instead.

You should also just loop over prods directly, there is no need for a while loop there:

import urllib
from bs4 import BeautifulSoup

fileprod = urllib2.urlopen("http://testh3x.altervista.org/walmart.txt").read()
prods = fileprod.split("|")

for prod in prods:
    # split off last part of the URL for the product code
    product_code = prod.rsplit('/', 1)[-1]

    data = urllib2.urlopen(prod).read()
    soup = BeautifulSoup(data)
    if soup.find(class_='Outofstock'):
        print product_code, 'out of stock!'
        continue

    price = soup.find('span', class_='camelPrice').text
    print product_code, price

For your starter URL, that prints:

7812821 $32.98

2 Comments

I'm parsing a link like this: walmart.com/ip/Regalo-Easy-Open-Baby-Gate/7812821 and i want to get the final number...
@user3423076: yes, I see what you were trying to do with that parsing line. Splitting text is much easier.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.