Using regex, I want to be able to get the text between multiple html tags. Here HTML is just for representation of input, I am not worried about HTML tags, just want to retrieve the content in the HTML tags(between both correct open and close tags). For instance, the following:
Required Input:
<h1>Text 1</h1>
<h1><h2>Text 2</h2></h1>
<h1><h2>Text 3</h2>Xtra</h1>
<h1>Text 4<h1>extra</h1515></h1>
<h1><h1></h1></h1>
Required Output:
Text 1
Text 2
Text 3
None
None
Output Obtained:
Text 1
Text 2
Text 3
Text 4<h1>extra</h1515>
<h1></h1>
Regex I tried:
"<([\\S ]+)>([\\S ]+)</\\1>"
I am not getting the expected result.
My java code:
import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;
public class Solution{
public static void main(String[] args){
Scanner in = new Scanner(System.in);
int testCases = Integer.parseInt(in.nextLine());
while(testCases>0){
String line = in.nextLine();
String tmp = line;
Pattern r = Pattern.compile("<([\\S ]+)>([\\S ]+)</\\1>", Pattern.MULTILINE);
Matcher m = r.matcher(line);
while(m.find()){
line = line.replaceAll(line, m.group(2));
m = r.matcher(line);
}
if(line != tmp)
System.out.println(line);
else
System.out.println("None");
testCases--;
}
}
}