0

I'm having problems with this method in python called findall. I'm accessing a web pages HTML and trying to return the name of a product in this case 'bread' and print it out to the console.

4
  • 2
    Don't parse HTML with regular expressions. Many people will tell you this. Commented Apr 15, 2013 at 3:09
  • 1
    crummy.com/software/BeautifulSoup Commented Apr 15, 2013 at 3:12
  • 1
    Looks to me like you're getting the number of spaces wrong. Try \s+ instead to be less dependent on the count, like "Item:\s+is in\s+lane 12\s+(\w*)". (Disclaimer: not really tested.) And while the advice not to use regex to parse HTML is good, while something like BeautifulSoup is going to make it easier to get at the text, if you want to extract bread from the text, you're probably going to wind up using regexes at that point anyway. Commented Apr 15, 2013 at 3:16
  • Wow DSM that did the trick I can't believe it just putting \s+. I don't know how the spaces were incorrect. i tried over hundred times even copied and pasted the HTML thanks alot Commented Apr 15, 2013 at 3:23

2 Answers 2

3

Don't use regex for HTML parsing. There are a few solutions. I suggest BeautifulSoup (http://www.crummy.com/software/BeautifulSoup/)

Having said so, however, in this particular case, RE will suffice. Just relax it a notch. There might be more or less spaces or maybe those are tabs. So instead of literal spaces use the space class \s:

product = re.findall(r'Item:\s*is\s*in\s*lane\s*12\s*(\w*)', content)
print product[0]

Since The '*', '+', and '?' qualifiers are all greedy (they match as much text as possible) you don't need to restrict it with [^<]*<br>

Sign up to request clarification or add additional context in comments.

Comments

1

In case you still want to use regexps, here's a working one for your case:

product = re.findall(r'<br>\s*Item:\s+is\s+in\s+lane 12\s+(\w*)[^<]*<br>', content)

It takes into account DSM's space flexibility suggestion and non-letters after (\w*) that might appear before <br>.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.