How to Extract Text from Div Tag From Html in java

Question

Hi,

I want to extract text between div tag

<div class="innercontenttxt"> 
<p>img border="1" align="left" height="170" width="324" vspace="3" hspace="2" src="/tmdbuserfiles/ramdev-balakrishna(1).jpg" alt="ramdev aide remanded, lakrishna acharya judicial remand, ramdev aide fake passport case, baba ramdev assistant judicial custody, balakrishna sent to judicial custody, yoga guru ramdev assistant remanded, yoga guru ramdev assistant balakrishna" />
Yoga guru Ramdev's aide Balakrishna Acharya remanded to 14 days judicial custody in a fake passport on Saturday. He was arrested yesterday after he failed to appear at a Dehradun court.
    <br />
    <br />
     Balakrishna Acharya, who is basically a Nepalese citizen, 
     is alleged to have submitted fake documents to procure a passport. 
     When he failed to appear in Dehradun court in connection with the case,
</p>  
</div>

After extracting the result should be:

ramdev aide alakrishna Acharya remanded to 14 days judicial custody in a fake passport on Saturday. He was arrested yesterday after he failed to appear at a Dehradun court.Balakrishna Acharya, who is basically a Nepalese citizen, is alleged to have submitted fake documents to procure a passport. When he failed to appear in Dehradun court in connection with the case, the court had issued a non-bailable warrant and subsequently arrested him yesterday.

I have tried different HTML Parsers like Jericho HTML Parser ,HTML Parser ,J soup Parser But those all are not supported in j2me — String
– String, Commented Jul 26, 2012 at 5:44
you need a general solution to parse div tags or specific to your case ? — sunil
– sunil, Commented Jul 26, 2012 at 6:09
Is any parser available with out using java.net.url class for my case?Can u help me out? — String
– String, Commented Jul 26, 2012 at 7:28

Community · Accepted Answer · 2017-05-23 12:33:39Z

1

This problem seems similiar to this other question.

Assuming you already have the html source stored in a String variable called htmlPage.

int divIndex = htmlPage.indexOf("<div");
divIndex = htmlPage.indexOf(">", divIndex);

int endDivIndex = htmlPage.indexOf("</div>", divIndex);
String content = htmlPage.substring(divIndex + 1, endDivIndex);

edited May 23, 2017 at 12:33

CommunityBot

11 silver badge

answered Jul 26, 2012 at 14:59

Telmo Pimentel Mota

4,04318 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

maneesh · Accepted Answer · 2012-07-26 04:52:56Z

1

You may want to try some of the Java HTML parser libraries

HTML Parser - http://htmlparser.sourceforge.net

jsoup - http://jsoup.org/

answered Jul 26, 2012 at 4:52

maneesh

1,1121 gold badge8 silver badges11 bronze badges

1 Comment

String Over a year ago

Hii,this i am developing for j2me application,there ,i am not having java.net.url class,so other than this class any tell me any parsers avilable for my need...

Collectives™ on Stack Overflow

How to Extract Text from Div Tag From Html in java

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related