2

I've got an XML file.

<key>457</key>
    <dict>
        <key>Track ID</key><integer>457</integer>
        <key>Name</key><string>Love me do</string>
        <key>Artist</key><string>The Beatles</string>
        <key>Album Artist</key><string>The Beatles</string>
        <key>Composer</key><string>John Lennon/Paul McCartney</string>
        <key>Album</key><string>The Beatles No.1</string>
        <key>Genre</key><string>Varies</string>
        <key>Kind</key><string>AAC audio file</string>
</dict>

I've removed for these purposes a lot of the file (this is one song, and there are about 20-30 more lines of XML per song). What I'd like to do is extract the 'Artist' string from each song, and then remove all of the repeated strings, and then take that and output it into an HTML file; preferably in a way that autorefreshes when a new version of the .xml is found, thus keeping an updated file, but if that overcomplicates it, that's fine.

I've looked into ways with doing it with jQuery, and I've had PHP suggested, but I'm unsure of which is the better/cleaner; and I'm unsure how I would go about doing it in either.

Many thanks,

Henry.

2 Answers 2

1

I would do this in PHP: put your XML into a string, then (because only you are going to use this), encode it to JSON, decode it into an assoc array, then run a foreach loop to extract the artists, and finally remove the duplications, and then save it as an HTML. Then, you can add a cron job to run this periodically, and generate the HTML. Run this code, then link to the results that it gives out.

$contents = '<key>Blah.... lots of XML';

$xml = simplexml_load_string($contents);
$json = json_encode($xml);
$array = json_decode($json, true);

print_r($array);

Once I know the structure of the array that is produced, I can complete the code. But it would look something like this:

foreach($array['dict']['artist'] as $artist) {
    $artists[] = $artist;
}

// Now $artists holds an array of the artists

$arists = array_unique($artists);

// Now there are no duplicates

foreach($artists as $artist) {
    $output .= '<p>',$artist,'</p>';
}

// Now each artist is put in it's own paragraph.

// Either output the output
echo $output;

// Or save it to a file (in this case, 'artists.html')

$fh = fopen('artists.html', 'w') or die("Can't open file");
fwrite($fh, $output);
fclose($fh);

This does not work completely get as the line in the first foreach loop needs a bit of tweaking, but this is a starting point.

Sign up to request clarification or add additional context in comments.

Comments

1

What exactly are you trying to achieve? If you need HTML files that are periodically regenerated based on the XML files, then you probably want to write a program (for example, the BeautifulSoup Python library allows you to parse XML/HTML files quite easily) for it and run it every time you need to update the HTML files (you can also set up a cron job for it).

If you need to be able to fetch the data from XML on the fly, you can use some JavaScript library and load the XML from an xml file, then add it to the page dynamically.

For example, this Python program will parse an XML file (file.xml) and create an HTML file (song_information.html) that contains data from the XML file.

from BeautifulSoup import BeautifulStoneSoup

f = open("file.xml")
soup = BeautifulStoneSoup(f.read())
f.close()

html = """<!DOCTYPE html>
<html>
<head>
<title>Song information</title>
</head>
<body>
"""

for key in soup.dict.findAll('key'):
    html += "<h1>%s</h1>\n" % key.contents[0]
    html += "<p>%s</p>\n" % key.nextSibling.contents[0]

html += """</body>
</html>
"""

f = open("song_information.html", "w")
f.write(html)
f.close()

It will write the following HTML to the song_information.html file:

<!DOCTYPE html>
<html>
<head>
<title>Song information</title>
</head>
<body>
<h1>Track ID</h1>
<p>457</p>
<h1>Name</h1>
<p>Love me do</p>
<h1>Artist</h1>
<p>The Beatles</p>
<h1>Album Artist</h1>
<p>The Beatles</p>
<h1>Composer</h1>
<p>John Lennon/Paul McCartney</p>
<h1>Album</h1>
<p>The Beatles No.1</p>
<h1>Genre</h1>
<p>Varies</p>
<h1>Kind</h1>
<p>AAC audio file</p>
</body>
</html>

Of course, this is simplified. If you need to implement unicode support, you will want to edit it like this:

from BeautifulSoup import BeautifulStoneSoup
import codecs

f = codecs.open("file.xml", "r", "utf-8")
soup = BeautifulStoneSoup(f.read())
f.close()

html = """<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Song information</title>
</head>
<body>
"""

for key in soup.dict.findAll('key'):
    html += "<h1>%s</h1>\n" % key.contents[0]
    html += "<p>%s</p>\n" % key.nextSibling.contents[0]

html += """</body>
</html>
"""

f = codecs.open("song_information.html", "w", "utf-8")
f.write(html)
f.close()

Also, you will probably need to generate more complex HTML, so you will likely want to try some template systems like Jinja2.

1 Comment

An HTML file periodically regenerated; the former. I'll take a look into that - many thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.