I want to get text from html file in java
My html file is:
<body>
<p>vishal</p>
<strong>patel</strong>
<bold >vishal patel
I want to output like this
vishal
patel
vishal patel
How to do this please help me
I want to get text from html file in java
My html file is:
<body>
<p>vishal</p>
<strong>patel</strong>
<bold >vishal patel
I want to output like this
vishal
patel
vishal patel
How to do this please help me
I have used a library called JSoup.
It's very simple to retrieve the text-only part from a HTML file.
It's very simple:
Jsoup.parse(html).text();
gives you the text from the HTML file
Better to use html Parser....I prefer to use JSoup parser(opensource package)....
import org.jsoup.Jsoup;
public class HTMLUtils {
public static String extractText(Reader reader) throws IOException {
StringBuilder sb = new StringBuilder();
BufferedReader br = new BufferedReader(reader);
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
}
String textOnly = Jsoup.parse(sb.toString()).text();
return textOnly;
}
public final static void main(String[] args) throws Exception {
FileReader reader = new FileReader("C:/RealHowTo/topics/java-language.html");
System.out.println(HTMLUtils.extractText(reader));
}
}