0

I am trying to work with small XML files sent from web and parse few attributes from them. How would I approach this in JSoup? I know it's not XML Parser but HTML one but it supports XML too and I don't have to build any Handlers, BuildFactories and such as I would have to in DOM, SAX etc.

Here is example xml: LINK I can't paste it here because it exits the code tag after every line - if someone can fix that I would be grateful.

And here is my piece of code::

String xml = "http://www.omdbapi.com/?t=Private%20Ryan&y=&plot=short&r=xml";
Document doc = Jsoup.parse(xml, "", Parser.xmlParser());
// want to select first occurrence of genre tag though there is only one it 
// doesn't work without .first() - but it doesn't parse it
Element genreFromXml = doc.select("genre").first();
String genre = genreFromXml.text();
System.out.println(genre);

It results in NPE at:

String genre = genreFromXml.text();

1 Answer 1

3

There are 2 issues in your code:

  1. You provide a String representation of an URL while an XML content is expected, you should rather use the method parse(InputStream in, String charsetName, String baseUri, Parser parser) instead to parse your XML as an input stream.
  2. There is no element genre in your XML, genre is an attribute of the element movie.

Here is how your code should look like:

String url = "http://www.omdbapi.com/?t=Private%20Ryan&y=&plot=short&r=xml";
// Parse the doc using an XML parser
Document doc = Jsoup.parse(new URL(url).openStream(), "UTF-8", "", Parser.xmlParser());
// Select the first element "movie"
Element movieFromXml = doc.select("movie").first();
// Get its attribute "genre"
String genre = movieFromXml.attr("genre");
// Print the result
System.out.println(genre);

Output:

Drama, War
Sign up to request clarification or add additional context in comments.

5 Comments

I have to handle Malformed and IO Exceptions from URL right? My IDE says I do, but want to check with you.
Yes you need to deal with the exception, you could simply throw it
Do you know how could I access just the first word in genre without working on a String using array to pick only 'Drama'? Because very often movies on IMDB have many genres assigned and I need only one.
apart manipulating the String like what you describe for example, I have no idea, IMHO it is the only way
Alright, did that with .split and then accessing right index. Thank you for your help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.