Python regular Expression to get text between two strings

Question

when i read a text, i have string like <h3 class="heading">General Purpose</h3> in some of the lines of the text, now i want to get only value that is General Purpose from above..

d = re.search(re.escape('<h3 class="heading">')+"(.*?)"+re.escape('</h3>'), str(data2))
if d:
    print(d.group(0))

Can you make your question more clear? Include data2 in your question and also mention what are you trying to extract from data2. — Mohammad Yusuf
– Mohammad Yusuf, Commented Nov 15, 2016 at 5:38
Is this an example string, or do you actually have HTML? stackoverflow.com/questions/1732348/… — OneCricketeer
– OneCricketeer, Commented Nov 15, 2016 at 6:19
I think you want d.group(1). 0 is the whole matched string, 1 is the first parenthesized group. — roarsneer
– roarsneer, Commented Nov 15, 2016 at 6:19

Mohammad Yusuf · Accepted Answer · 2016-11-15 07:09:36Z

4

import re

text="""<h3 class="heading">General Purpose</h3>"""
pattern="(<.*?>)(.*)(<.*?>)"

g=re.search(pattern,text)
g.group(2)

Output:

'General Purpose'

Demo on Regex101

If its a beautiful soup object then its even simpler to get the value. You wont need the regex.

from bs4 import BeautifulSoup

text="""<h3 class="heading">General Purpose</h3>"""
a=BeautifulSoup(text)
print a.select('h3.heading')[0].text

Output:

General Purpose

edited Nov 15, 2016 at 7:09

answered Nov 15, 2016 at 6:28

Mohammad Yusuf

17.1k12 gold badges60 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mohammad Yusuf Over a year ago

If its already a beautifulsoup object then you don't have to use additional regex to extract the data. You can use beautifulsoup methods to extract the html data.

Tim Pietzcker Over a year ago

@kattaprasanth: I wrote my answer before your comment that you're using BeautifulSoup. In that case, please remove the "accepted" checkmark from my answer and give it to this one because it's clearly the better one.

kattaprasanth Over a year ago

@TimPietzcker .. Actually for that, beautifulsoup was returning None.. now its working i am using tbody to get the required output... thanks once again

Tim Pietzcker · Accepted Answer · 2016-11-15 06:19:01Z

1

Group 0 contains the entire match; you want the contents of group 1:

print(d.group(1))

But generally, using regexes to parse HTML is not such a good idea (although practically speaking, nested h3 tags should be rather uncommon).

answered Nov 15, 2016 at 6:19

Tim Pietzcker

337k59 gold badges520 silver badges572 bronze badges

Comments

danielpopa · Accepted Answer · 2016-11-15 08:04:54Z

1

Warning: works ONLY IN python, NOT pcre or JS (Lookbehind is not supported in JS).

(?<=\<\h3 class=\"heading\"\>).*?(?=\<\/h3\>)

answered Nov 15, 2016 at 8:04

danielpopa

82014 silver badges28 bronze badges

Collectives™ on Stack Overflow

Python regular Expression to get text between two strings

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related