2

Using regex, I want to be able to get the text between multiple html tags. Here HTML is just for representation of input, I am not worried about HTML tags, just want to retrieve the content in the HTML tags(between both correct open and close tags). For instance, the following:

Required Input:

<h1>Text 1</h1>
<h1><h2>Text 2</h2></h1>
<h1><h2>Text 3</h2>Xtra</h1>
<h1>Text 4<h1>extra</h1515></h1>
<h1><h1></h1></h1>

Required Output:

Text 1
Text 2
Text 3
None
None

Output Obtained:

Text 1
Text 2
Text 3
Text 4<h1>extra</h1515>
<h1></h1>

Regex I tried:

"<([\\S ]+)>([\\S ]+)</\\1>"

I am not getting the expected result.

My java code:

import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;

public class Solution{
   public static void main(String[] args){

      Scanner in = new Scanner(System.in);
      int testCases = Integer.parseInt(in.nextLine());
      while(testCases>0){
         String line = in.nextLine();
         String tmp = line;
          Pattern r = Pattern.compile("<([\\S ]+)>([\\S ]+)</\\1>", Pattern.MULTILINE);
         Matcher m = r.matcher(line);
         while(m.find()){
             line = line.replaceAll(line, m.group(2));
             m = r.matcher(line);
         }
         if(line != tmp)
             System.out.println(line);
          else
              System.out.println("None");
         testCases--;
      }
   }
}
4

1 Answer 1

2

As pointed out in the comments that way lies nothing but pain. For what your attempting to do you would be far better off walking the DOM (Document Object Model) with something like jsoup

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.