I'm having problems with this method in python called findall. I'm accessing a web pages HTML and trying to return the name of a product in this case 'bread' and print it out to the console.
2 Answers
Don't use regex for HTML parsing. There are a few solutions. I suggest BeautifulSoup (http://www.crummy.com/software/BeautifulSoup/)
Having said so, however, in this particular case, RE will suffice. Just relax it a notch. There might be more or less spaces or maybe those are tabs. So instead of literal spaces use the space class \s:
product = re.findall(r'Item:\s*is\s*in\s*lane\s*12\s*(\w*)', content)
print product[0]
Since The '*', '+', and '?' qualifiers are all greedy (they match as much text as possible) you don't need to restrict it with [^<]*<br>
\s+instead to be less dependent on the count, like"Item:\s+is in\s+lane 12\s+(\w*)". (Disclaimer: not really tested.) And while the advice not to use regex to parse HTML is good, while something like BeautifulSoup is going to make it easier to get at the text, if you want to extractbreadfrom the text, you're probably going to wind up using regexes at that point anyway.