How can I extract values, from specific tags, from an XML file into an HTML page?

Question

I've got an XML file.

<key>457</key>
    <dict>
        <key>Track ID</key><integer>457</integer>
        <key>Name</key><string>Love me do</string>
        <key>Artist</key><string>The Beatles</string>
        <key>Album Artist</key><string>The Beatles</string>
        <key>Composer</key><string>John Lennon/Paul McCartney</string>
        <key>Album</key><string>The Beatles No.1</string>
        <key>Genre</key><string>Varies</string>
        <key>Kind</key><string>AAC audio file</string>
</dict>

I've removed for these purposes a lot of the file (this is one song, and there are about 20-30 more lines of XML per song). What I'd like to do is extract the 'Artist' string from each song, and then remove all of the repeated strings, and then take that and output it into an HTML file; preferably in a way that autorefreshes when a new version of the .xml is found, thus keeping an updated file, but if that overcomplicates it, that's fine.

I've looked into ways with doing it with jQuery, and I've had PHP suggested, but I'm unsure of which is the better/cleaner; and I'm unsure how I would go about doing it in either.

Many thanks,

Henry.

Alfo · Accepted Answer · 2012-04-09 12:07:07Z

I would do this in PHP: put your XML into a string, then (because only you are going to use this), encode it to JSON, decode it into an assoc array, then run a foreach loop to extract the artists, and finally remove the duplications, and then save it as an HTML. Then, you can add a cron job to run this periodically, and generate the HTML. Run this code, then link to the results that it gives out.

$contents = '<key>Blah.... lots of XML';

$xml = simplexml_load_string($contents);
$json = json_encode($xml);
$array = json_decode($json, true);

print_r($array);

Once I know the structure of the array that is produced, I can complete the code. But it would look something like this:

foreach($array['dict']['artist'] as $artist) {
    $artists[] = $artist;
}

// Now $artists holds an array of the artists

$arists = array_unique($artists);

// Now there are no duplicates

foreach($artists as $artist) {
    $output .= '<p>',$artist,'</p>';
}

// Now each artist is put in it's own paragraph.

// Either output the output
echo $output;

// Or save it to a file (in this case, 'artists.html')

$fh = fopen('artists.html', 'w') or die("Can't open file");
fwrite($fh, $output);
fclose($fh);

This does not work completely get as the line in the first foreach loop needs a bit of tweaking, but this is a starting point.

Arseny · Accepted Answer · 2012-04-09 12:23:31Z

What exactly are you trying to achieve? If you need HTML files that are periodically regenerated based on the XML files, then you probably want to write a program (for example, the BeautifulSoup Python library allows you to parse XML/HTML files quite easily) for it and run it every time you need to update the HTML files (you can also set up a cron job for it).

If you need to be able to fetch the data from XML on the fly, you can use some JavaScript library and load the XML from an xml file, then add it to the page dynamically.

For example, this Python program will parse an XML file (file.xml) and create an HTML file (song_information.html) that contains data from the XML file.

from BeautifulSoup import BeautifulStoneSoup

f = open("file.xml")
soup = BeautifulStoneSoup(f.read())
f.close()

html = """<!DOCTYPE html>
<html>
<head>
<title>Song information</title>
</head>
<body>
"""

for key in soup.dict.findAll('key'):
    html += "<h1>%s</h1>\n" % key.contents[0]
    html += "<p>%s</p>\n" % key.nextSibling.contents[0]

html += """</body>
</html>
"""

f = open("song_information.html", "w")
f.write(html)
f.close()

It will write the following HTML to the song_information.html file:

<!DOCTYPE html>
<html>
<head>
<title>Song information</title>
</head>
<body>
<h1>Track ID</h1>
<p>457</p>
<h1>Name</h1>
<p>Love me do</p>
<h1>Artist</h1>
<p>The Beatles</p>
<h1>Album Artist</h1>
<p>The Beatles</p>
<h1>Composer</h1>
<p>John Lennon/Paul McCartney</p>
<h1>Album</h1>
<p>The Beatles No.1</p>
<h1>Genre</h1>
<p>Varies</p>
<h1>Kind</h1>
<p>AAC audio file</p>
</body>
</html>

Of course, this is simplified. If you need to implement unicode support, you will want to edit it like this:

from BeautifulSoup import BeautifulStoneSoup
import codecs

f = codecs.open("file.xml", "r", "utf-8")
soup = BeautifulStoneSoup(f.read())
f.close()

html = """<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Song information</title>
</head>
<body>
"""

for key in soup.dict.findAll('key'):
    html += "<h1>%s</h1>\n" % key.contents[0]
    html += "<p>%s</p>\n" % key.nextSibling.contents[0]

html += """</body>
</html>
"""

f = codecs.open("song_information.html", "w", "utf-8")
f.write(html)
f.close()

Also, you will probably need to generate more complex HTML, so you will likely want to try some template systems like Jinja2.

An HTML file periodically regenerated; the former. I'll take a look into that - many thanks!

Collectives™ on Stack Overflow

How can I extract values, from specific tags, from an XML file into an HTML page?

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related