1

Good day. Little problem with regexp.

I have a regexp that look like

rexp2 = re.findall(r'<p>(.*?)</p>', data)

And i need to grab all in

<div id="header">
<h1></h1>
<p>
localhost OpenWrt Backfire<br />
Load: 0.00 0.00 0.00<br />
Hostname: localhost
</p>
</div>

But my code doesnt work :( What im doing wrong?

2

4 Answers 4

4

Statutory Warning: It is a Bad Idea to parse (X)HTML using regular expression.

Fortunately there is a better way. To get going, first install the BeautifulSoup module. Next, read up on the documentation. Third, code!

Here is one way to do what you are trying to do:

from BeautifulSoup import BeautifulSoup
html = """<div id="header">
<h1></h1>
<p>
localhost OpenWrt Backfire<br />
Load: 0.00 0.00 0.00<br />
Hostname: localhost
</p>
</div>"""
soup = BeautifulSoup(html)
for each in soup.findAll(name = 'p'):
    print each
Sign up to request clarification or add additional context in comments.

Comments

1

I wouldn't recommend using regular expressions this way. Try parsing HTML with Beautiful Soup instead and walk the DOM tree.

1 Comment

Ok.How can a do it with beautiful soup?
0

dot is not mathching enter, use re.DOTALL

re.findall(r'<p>(.*?)</p>', data, re.DOTALL)

Comments

0

You need to specify re.M (multiline) flag to match multiline strings. But parsing HTML with regexps isn't a particularly good idea.

It looks like you want some stats from an OpenWrt-powered router. Why don't you write simple CGI script that outputs required information in machine-readable format?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.