0

I am trying to extract the text from the below html snippet. Need help in regex pattern that will replace all the html tag and only will leave out the content.

I tried to remove the <span*> using the below expression but that didn't do the trick.

 String x = '<span style="font-size:11pt;"><span style="line-height:107%;"><span style="font-family:Calibri, sans-serif;"><strong><font color="#000000">Some normal text here...</font></strong></span></span></span>';
 String y = x.replaceAll('[<span*\b>]','');
 system.debug(y);

This prints out:

  tyle="fot-ize:11t;" tyle="lie-height:107%;" tyle="fot-fmily:Clibri, -erif;"trogfot color="#000000"Some normal text here.../fot/trog///

So it basically replaced the each character individually and not the content between the <span ... >

Need Help

1
  • .replaceAll("<span.*?>","") Commented Jan 14, 2018 at 15:49

1 Answer 1

2

The second line of code should be:

String y = x.replaceAll('<span[^>]*>','');

The meaning of this statement is: for all the occurrences of '<span' followed by many occurences (*) of anything but '>' ([^>]) followed by a single '>', replace by nothing.

By the way, you will miss the closing tab </span>. I tell this just for your information, because you didn't ask for this.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.