1

I am trying to extract the contents of the table using Regex.

I have removed most of the tags from the table, i am stuck with <br> , <a href >, <img > & <b> How to remove them ??

for <b> tag i tried this Regex

 \s*<b[^>]*>\s* 
(?<value>.*?)
 \s* </b>\s*

it worked for some lines and some its giving the out put as

<b class="saadirheader">Email:</b>

Can anyone help me removing these tags

<br> , <a href >, <img > and  <b>

Full Tags :-

<img src="Newrecord_files/spacer.gif" alt="" border="0" height="1" width="5">

<a href="mailto:[email protected]">

Thanking you,

Naveen HS

3
  • Do you already know strip_tags? Commented Jul 30, 2010 at 9:48
  • 3
    Also, obligatory link: stackoverflow.com/questions/1732348/… Commented Jul 30, 2010 at 9:49
  • You may also want to learn about the difference between greedy and non-greedy expressions. I.e. in <b.*?> vs <b[^>]*> Commented Jul 30, 2010 at 9:52

1 Answer 1

1

Use the following Regex:

(?:<br|<a href|<img|<b)(?:.(?!>))*.>

This Regex will match all the tags you mentioned above, and if there are more tags you forgot to mention just add a "|" sign with the tag you want to add, and insert it into the first parentheses.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.