1

I have to do a large replacement on a wp_posts.sql and I want to remove all the <a href> and </a> tags. I am trying to do this with VIM but I cant figure out the regular expression for it.

5
  • Welcome to Stack Overflow. Hopefully the community will be able to help shortly. Do you have specific examples of what you've tried so far? Commented Jun 13, 2011 at 22:10
  • note that regex is not an encouraged tool for HTML. A parser should be used instead. E.g. a simple regex can not discern tags that are actually commented out or something like that. Neither can a complex one ever cover all problem cases. That being said, you may just be lucky enough not to run into problems when you use a regex for this simple case Commented Jun 13, 2011 at 23:11
  • Without more restrictions regex cannot be used. For example, Tomalak's answer will fail on something like <a href="javascript:f(a>b)">. Commented Jun 14, 2011 at 3:48
  • @ZyX: He's asking for a one-off regex for use in a text editor, not one for production use in some library. That's a small but significant difference. Commented Jun 14, 2011 at 5:59
  • @Tomalak: It does not mean that more restrictions are not needed. He can't use regex even in text editor if it is going to produce wrong results. Commented Jun 15, 2011 at 4:31

2 Answers 2

5

To remove entire <a> tags (with content):

:%s!<a[^>]\+>[\s\S]\{-}</a>!!g

To remove just the tags (keep content):

:%s!<a[^>]\+>\|</a>!!g
Sign up to request clarification or add additional context in comments.

Comments

0

If you're familiar with python, the Beautiful Soup module performs magic on html. It might be worth your while to look into it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.