9

I want to get text from html file in java

My html file is:

<body>

<p>vishal</p>
<strong>patel</strong>
<bold >vishal patel

I want to output like this

vishal 

patel

vishal patel

How to do this please help me

4
  • Do you want to read it from a a html file on the web, than you should use the following tutorial docs.oracle.com/javase/tutorial/networking/urls/… Commented Mar 9, 2012 at 9:16
  • Otherwise use indeed an XmlParser. By the way you didn't end the <bold> tag Commented Mar 9, 2012 at 9:16
  • Check jsoup Commented Mar 9, 2012 at 9:16
  • jsoup helps you extract it, but doesn't help you render it. Found this though: lobobrowser.org/cobra.jsp Commented Oct 24, 2012 at 17:33

2 Answers 2

23

I have used a library called JSoup.
It's very simple to retrieve the text-only part from a HTML file.
It's very simple:

Jsoup.parse(html).text();

gives you the text from the HTML file

Sign up to request clarification or add additional context in comments.

4 Comments

I want three different text so that i stored in String array But jsoup give me only one text ...
@user1206635 buddy, you gotta try some on your own.
@user1206635 JSoup gives you the text, you gotta do the rest. Nishant, +1 for ya !
@Vishal Android developer JSoup have numerous selectors to work with tags. You can refer to the page jsoup.org/cookbook/extracting-data/selector-syntax to check what fits you best.
4

Better to use html Parser....I prefer to use JSoup parser(opensource package)....

import org.jsoup.Jsoup;
public class HTMLUtils {

    public static String extractText(Reader reader) throws IOException {
        StringBuilder sb = new StringBuilder();
        BufferedReader br = new BufferedReader(reader);
        String line;
        while ((line = br.readLine()) != null) {
            sb.append(line);
        }
        String textOnly = Jsoup.parse(sb.toString()).text();
        return textOnly;
    }

    public final static void main(String[] args) throws Exception {
        FileReader reader = new FileReader("C:/RealHowTo/topics/java-language.html");
        System.out.println(HTMLUtils.extractText(reader));
    }
}

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.