1

I have a html file like the following

...
<span itemprop="A">234</span>
...
<span itemprop="B">690</span>
...

In this i want to extract values as A and B.
Can u suggest any html parser library for java that can do this easily?

3 Answers 3

3

Personally, I favour JSoup over JTidy. It has CSS-like selectors, and the documentation is much better, imho. With JSoup, you can easily extract those values with the following lines:

Document doc = Jsoup.connect("your_url").get();
Elements spans = doc.select("span[itemprop]");

for (Element span : spans) {
  System.out.println(span.text()); // will print 234 and 690
}
Sign up to request clarification or add additional context in comments.

4 Comments

I don't want to extract A and B but the other values 234 and 690
@vivek_jonam: Then use text() instead, which gives you the content of span. I've edited my answer.
ok. works. But can i get the values with A and B alone? there are other itemprop values like A1, C, E, etc.
Yes, there are two ways of doing this. 1) When you are iterating over each span element, you can check if span.attr("itemprop") equals A or B; 2) You can run two selects, one with span[itemprop=A] and the other with span[itemprop=B].
1

http://jsoup.org/

JSoup is the way to go.

Comments

1

JTidy is a confusingly named yet respected HTML parser.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.