1

I have a puzzle that requires your help : I need to replace certain words with links in an HTML Text.

For example, I have to replace "word" with "<a href="...">word</ a>"

The difficulty is double :

  • 1. not to add links in tag attributes
  • 2. not to add links other links (nested links).

I found a solution to meet the case (1) but I can not handle the case (2).

Here is my simplified code:

String text="sample text <a>sample text</a> sample <a href='http://www.sample.com'>a good sample</a>";
String wordToReplace="sample";
String pattern="\\b"+wordToReplace+"\\b(?![^<>]*+>)"; //the last part is here to solve de problem (1)
String link="["+wordToReplace+"]"; //for more clarity, the generated link is replaced by [...]

System.out.println(text.replaceAll(pattern,link));

The result is:

[sample] text <a>[sample] text</a> [sample] <a href='http://www.sample.com'>a good [sample]</a>

Problem : there is a link in a another link.

Do you have an idea how to solve this problem ?

Thank you in advance

1

1 Answer 1

1

Parsing HTML with regex is always a bad idea, precisely because of odd cases such as this. It would be better to use an HTML parser. Java has a built-in HTML Parser with using Swing that you might want to look into.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.