Extract div content from htmlsource in string (Java)

Question

i'm trying to extract the content of an special div-tag(defined by his classname) out of a string that contains html source. I think the regexp-features of Java are not as easy to use as in perl, right?

Does anyone did this before and can give me a piece of code? perhaps dom-browsing is a good solution, but i didn't found any tutorials, matching to my problem.

do you only need to parse the div tag, or the whole document? — Mike Caron
– Mike Caron, Commented May 7, 2009 at 19:19
I read the whole html document ... it's a kind of crawler. the input would be something like: <html> ... some other code ... <div class="myDiv">text i want to extract, can contain blanks, newlines and other tags</div> — Micha
– Micha, Commented May 7, 2009 at 19:27

A_M · Accepted Answer · 2009-05-09 19:40:19Z

1

You could use HTML Parser or some other HTML parsing library from this list.

answered May 9, 2009 at 19:40

A_M

7,8716 gold badges36 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

alphazero · Accepted Answer · 2009-05-07 23:10:51Z

0

Based on your comments it sounds like you have a general case ("crawler") and thus you're effectively parsing an XML file. If the source page is xhtml, then you have a variety of options in various XML libraries. (JDom, for example).

answered May 7, 2009 at 23:10

alphazero

27.3k3 gold badges32 silver badges27 bronze badges

Collectives™ on Stack Overflow

Extract div content from htmlsource in string (Java)

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related