Regex pattern to replace html in a given text string

Question

I am trying to extract the text from the below html snippet. Need help in regex pattern that will replace all the html tag and only will leave out the content.

I tried to remove the <span*> using the below expression but that didn't do the trick.

 String x = '<span style="font-size:11pt;"><span style="line-height:107%;"><span style="font-family:Calibri, sans-serif;"><strong><font color="#000000">Some normal text here...</font></strong></span></span></span>';
 String y = x.replaceAll('[<span*\b>]','');
 system.debug(y);

This prints out:

  tyle="fot-ize:11t;" tyle="lie-height:107%;" tyle="fot-fmily:Clibri, -erif;"trogfot color="#000000"Some normal text here.../fot/trog///

So it basically replaced the each character individually and not the content between the <span ... >

Need Help

.replaceAll("<span.*?>","")

Wiktor Stribiżew
– Wiktor Stribiżew

2018-01-14 15:49:43 +00:00
Commented Jan 14, 2018 at 15:49 — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jan 14, 2018 at 15:49

Pierre François · Accepted Answer · 2018-01-14 15:50:32Z

2

The second line of code should be:

String y = x.replaceAll('<span[^>]*>','');

The meaning of this statement is: for all the occurrences of '<span' followed by many occurences (*) of anything but '>' ([^>]) followed by a single '>', replace by nothing.

By the way, you will miss the closing tab </span>. I tell this just for your information, because you didn't ask for this.

answered Jan 14, 2018 at 15:50

Pierre François

6,1681 gold badge21 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Regex pattern to replace html in a given text string

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related