Regex throwing exception in python

Question

I'm programming a ticker to get walmart out of stock and price changes... But I'm stuck: When I try to get the id of the item (ending number in the link) I can't parse it. Here is the code

# -*- coding: utf-8 -*-

import re
import urllib2

def walmart():
    fileprod = urllib2.urlopen("http://testh3x.altervista.org/walmart.txt").read()
    prods = fileprod.split("|")
    print prods
    lenp = len(prods)
    counter = 0
    while 1:
        while counter < lenp:
            data = urllib2.urlopen(prods[counter]).read()
            path = re.compile("class=\"Outofstock\"") #\s space - \w char - \W Tutto meno che char - 
            matching = path.match(data)
            if matching == None: 
                pass
            else:
                print "Out of stock"
            name = re.compile("\d") 
            m = name.match(str(prods[counter])).group #prods counter è il link
            print m


def main():
    walmart()

if __name__ == "__main__":
    main()

It throws:

  File "C:\Users\Leonardo\Desktop\BotDevelop\ticker.py", line 22, in walmart
    m = name.match(str(prods[counter])).group #prods counter ├¿ il link
AttributeError: 'NoneType' object has no attribute 'group'

You don't need to compile re every loop - you may do this before while. also, you could rewrite "class=\"Outofstock\"" with single outer quote 'class="Outofstock"', so you don't need to escape double quotes — akaRem
– akaRem, Commented Mar 15, 2014 at 12:23
Just as a comment, parsing html with regex ain't a very good idea: stackoverflow.com/questions/1732348/… — Paulo Bu
– Paulo Bu, Commented Mar 15, 2014 at 12:24

Justin O Barber · Accepted Answer · 2014-03-15 12:19:54Z

3

You should check into BeautifulSoup, which makes parsing html manageable and rather easy. Regexes won't usually do very well.

To answer your question, though, your error comes from the fact that no matches were found. In general, it is better to run a regex like this:

m = name.match(str(prods[counter]))  # if no match is found, then None is returned
if m:
    m = m.group()  # be sure to call the method here

answered Mar 15, 2014 at 12:19

Justin O Barber

11.6k2 gold badges43 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Martijn Pieters · Accepted Answer · 2014-03-15 12:31:41Z

1

Your regular expression didn't match. You are using re.match() instead of re.search(); the former only matches at the start of a string:

m = name.search(str(prods[counter])).group()

You don't need to re-compile your regular expressions in the loop either; move those out of the loops and compile them just once.

You really should not be using regular expressions to parse HTML, when there are better tools available. Use BeautifulSoup instead.

You should also just loop over prods directly, there is no need for a while loop there:

import urllib
from bs4 import BeautifulSoup

fileprod = urllib2.urlopen("http://testh3x.altervista.org/walmart.txt").read()
prods = fileprod.split("|")

for prod in prods:
    # split off last part of the URL for the product code
    product_code = prod.rsplit('/', 1)[-1]

    data = urllib2.urlopen(prod).read()
    soup = BeautifulSoup(data)
    if soup.find(class_='Outofstock'):
        print product_code, 'out of stock!'
        continue

    price = soup.find('span', class_='camelPrice').text
    print product_code, price

For your starter URL, that prints:

7812821 $32.98

edited Mar 15, 2014 at 12:31

answered Mar 15, 2014 at 12:20

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

2 Comments

user3423076 Over a year ago

I'm parsing a link like this: walmart.com/ip/Regalo-Easy-Open-Baby-Gate/7812821 and i want to get the final number...

Martijn Pieters Over a year ago

@user3423076: yes, I see what you were trying to do with that parsing line. Splitting text is much easier.

Collectives™ on Stack Overflow

Regex throwing exception in python

2 Answers 2

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related