Using Python and Regex,How do you remove <sup> tags from html? [duplicate]

Question

Using python regex, how do i remove all tags in html? The tags sometimes have styling, such as below:

<sup style="vertical-align:top;line-height:120%;font-size:7pt">(1)</sup>

I would like to remove everything between and including the sup tags in a larger string of html.

Obligatory reading for OPs trying to manipulate HTML with regex: stackoverflow.com/a/1732454/3001761 — jonrsharpe
– jonrsharpe, Commented Jul 2, 2014 at 14:38
I fixed my issue by converting html to string and using the following: re.sub(r'<sup+.*?sup>+','',string of html) — user2634569
– user2634569, Commented Jul 2, 2014 at 14:39

Community · Accepted Answer · 2017-05-23 10:26:14Z

6

I would use an HTML Parser instead (why). For example, BeautifulSoup and unwrap() can handle your beautiful sup:

Tag.unwrap() is the opposite of wrap(). It replaces a tag with whatever’s inside that tag. It’s good for stripping out markup.

from bs4 import BeautifulSoup

data = """
<div>
    <sup style="vertical-align:top;line-height:120%;font-size:7pt">(1)</sup>
</div>
"""

soup = BeautifulSoup(data)
for sup in soup.find_all('sup'):
    sup.unwrap()

print soup.prettify()

Prints:

<div>
(1)
</div>

edited May 23, 2017 at 10:26

CommunityBot

11 silver badge

answered Jul 2, 2014 at 14:39

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user2634569 Over a year ago

Thanks this is much more effective. I appreciate it.

fam Over a year ago

Is there a way of removing the tags along with the content inside them? The current solution only removes the tag.

Collectives™ on Stack Overflow

Using Python and Regex,How do you remove <sup> tags from html? [duplicate]

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related