1

I'm trying to parse some HTML in my Android app and I need to get the text:

Pan Artesano Elaborado por Panadería La Constancia. ¡Esta Buenísimo!

in

enter image description here

Is there any easy way to get only the text and remove all html tags?

The behavior that I need is exactly the one shown in this PHP code http://php.net/manual/es/function.strip-tags.php

4
  • do you need text as a String? Commented Jan 12, 2018 at 13:30
  • Yes @GautamChibde, that's it. Commented Jan 12, 2018 at 13:32
  • You can do this but without color. Commented Jan 12, 2018 at 13:34
  • you can use jsoup Commented Jan 12, 2018 at 13:34

3 Answers 3

2
Document doc = Jsoup.parse(html);
Element content = doc.getElementById("someid");
Elements p= content.getElementsByTag("p");

String pConcatenated="";
for (Element x: p) {
  pConcatenated+= x.text();
}

System.out.println(pConcatenated);//sometext another p tag
Sign up to request clarification or add additional context in comments.

Comments

0

Well when you want just to show it, then webview would help you, just set that string to webview and you got it.

When you would to use it elsewhere then i am to stupid for that :D.

 String data = "your html here";
        WebView webview= (WebView)this.findViewById(R.id.webview);
        webview.getSettings().setJavaScriptEnabled(true);
        webview.loadDataWithBaseURL("", data, "text/html", "UTF-8", "");

also you can pass just web URL webview.loadDataWithBaseURL("url","","text/html", "UTF-8", "");

Comments

-1

Firstly get HTML code with

HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);

String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
    str.append(line);
}
in.close();
html = str.toString();

then I recommend to create custom tag in HTML such as <toAndroid></toAndroid> and then you can get text with

String result = html.substring(html.indexOf("<toAndroid>", html.indexOf("</toAndroid>")));

your html for example

<toAndroid>Hello world!</toAndroid>

will result

Hello world!

Note that you can place <p> into <toAndroid> tags and then remove it in Java from result.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.