Python Regular expression

Question

I want to seach within a string for heading tags,I'm looking for a regular expression to find the index within the document wherever a heading tag occurs, So something like:

str.index('<h*>')

Where the * would represent only 1 character ie. 1 , 2 , 3 etc. eliminating any head tags or html tags

Any help would be much appreciated.

You could use <h\d> or <h\d[^>]+> if you want to match <hn...> (eg: it has other attributes. — NullUserException
– NullUserException, Commented Aug 31, 2011 at 14:50

Gabi Purcaru · Accepted Answer · 2011-08-31 14:55:29Z

1

import re

matches = re.finditer('<h[1-6]>', your_text)
for match in matches:
    print match.start()

answered Aug 31, 2011 at 14:55

Gabi Purcaru

31.7k9 gold badges81 silver badges96 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Gabriel Ross · Accepted Answer · 2011-08-31 14:51:11Z

0

The regex you need is this:

<h.>

This will match <h1>, <h2>, <hr>, etc... If you only want to match heading tags, use:

<h\d>

answered Aug 31, 2011 at 14:51

Gabriel Ross

5,2281 gold badge32 silver badges30 bronze badges

1 Comment

NullUserException Over a year ago

Horizontal rules are not headings.

Collectives™ on Stack Overflow

Python Regular expression

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related