1

Hi I am trying to find RegEx which helps me to replace words in HTML. Problem occurs if the word i am trying to replace is in HTML tag as well.

Example:<img class="TEST">asd TEST asd dsa asd </img>
and i need to get the second "TEST" only.

RegEx i am looking for should look like >[^<]*TEST, but this regex takes chars before the word TEST as well. Is it possible to select only word TEST ? but imagine other combinations as well (i dont think " TEST " is a good solution as soon as text could contain another chars as well)

2
  • 3
    see stackoverflow.com/questions/1732348/… Commented Apr 21, 2011 at 13:35
  • This is a job for a parser. Do a search for: "java html parser" and you will be on your way. Commented Apr 21, 2011 at 15:33

3 Answers 3

2

First of all, regex is not good option for html parsing.. There are lots of enhanced html parsers that you can use..

But if you insist to use regex , here is the regex ;

(?<=>.*)TEST(?=.*<)

for java,

(?<=>.{0,100000})TEST(?=.{0,100000}<)

for more information why we can not use * or + with lookbehind regex in Java , Regex look-behind without obvious maximum length in Java

Sign up to request clarification or add additional context in comments.

2 Comments

i am not parsing whole html, for that i use Jericho. I just wanted easy way of replacing some words. I cant make your regex working ...testing here myregexp.com
I like your solution, but not working for code like this: <p> [newLine here] TEST [newLine here] </p>
1

First of all, like has been said and will be said again, using regex for XML is usually a bad idea. But for really simple cases it can work, especially if you can live with sub-optimal results.

So, just put the test in a group and replace only the group

Something like

Pattern replacePattern = Pattern.compile(">[^<]*(TEST)");
Matcher matcher = replacePattern.matcher(theString);
String result = theString.substr(1,matcher.start(1)) + replacement + theString.substr(matcher.end(1));

Disclaimer: Not tested, might have some off-by-ones. But the concept should be clear.

Comments

0

How about if "TEST" is inside another tag than , like say inside the body tag, or for that matter inside the html tag?

1 Comment

ahh maybe i said it wrong way. i mean between '<' and '>'. it is okey if word is inside tag <> here </>, not ok if its < here>.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.