2

i'm trying to extract the content of an special div-tag(defined by his classname) out of a string that contains html source. I think the regexp-features of Java are not as easy to use as in perl, right?

Does anyone did this before and can give me a piece of code? perhaps dom-browsing is a good solution, but i didn't found any tutorials, matching to my problem.

3
  • Can you give an example of input and desired output? Commented May 7, 2009 at 19:18
  • do you only need to parse the div tag, or the whole document? Commented May 7, 2009 at 19:19
  • I read the whole html document ... it's a kind of crawler. the input would be something like: <html> ... some other code ... <div class="myDiv">text i want to extract, can contain blanks, newlines and other tags</div> Commented May 7, 2009 at 19:27

2 Answers 2

1

You could use HTML Parser or some other HTML parsing library from this list.

Sign up to request clarification or add additional context in comments.

Comments

0

Based on your comments it sounds like you have a general case ("crawler") and thus you're effectively parsing an XML file. If the source page is xhtml, then you have a variety of options in various XML libraries. (JDom, for example).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.