0

I'm a complete beginner, and have some problems trying to open an XML-file from an URL using Python.

Here's my code (a snippet i found on the web):

# import library to do http requests:
from urllib.request import urlopen


#import easy to use xml parser called minidom:
from xml.dom.minidom import parseString
#all these imports are standard on most modern python implementations

#download the file:
file = urlopen('http://www.odaa.dk/storage/f/2014-04-28T12%3A49%3A26.677Z/lejemaal.xml')
#convert to string:
data = file.read()
#close file because we dont need it anymore:
file.close()
#parse the xml you downloaded
dom = parseString(data)
#retrieve the first xml tag (<tag>data</tag>) that the parser finds with name tagName:
xmlTag = dom.getElementsByTagName('tagName')[0].toxml()
#strip off the tag (<tag>data</tag>  --->   data):
xmlData = xmlTag.replace('<tagName>', '').replace('</tagName>', '')
#print out the xml tag and data in this format: <tag>data</tag>
print(xmlTag)
#just print the data
print(xmlData)

When I run this, I get an error saying:

Traceback (most recent call last):
File "/Users/-----/PycharmProjects/First/test.py", line 20, in <module>
xmlTag = dom.getElementsByTagName('tagName')[0].toxml()
IndexError: list index out of range

Having read in similar threads here on the board, it seems like I'm trying to access something that doesn't exist. Or is it because the snippet I copied says "tagName"? Do I need to edit this?

How do I solve my problem? I'm not even sure what result I'm fishing for, as I'm just trying to get something to happen. Hopefully someone can point me in the right direction :)

2
  • Can you change your url? data = file.read() has not finished after 5mins... Commented Jul 3, 2014 at 13:10
  • I don't quite follow. Change the URL how? This is the XML file I want to open. If i change the URL, i would be targeting another file right? Commented Jul 3, 2014 at 15:21

1 Answer 1

1

In fact the code you have already does the work (untested).

The problem is just that there is no tag named 'tagName' in your xml file so python returns you an empty list.

You then try to get the first element of this empty list hence the IndexError.

You should try replacing tagName with the name of a tag present in your xml document such as 'row'.

Well You generally know what tags you have in your xml file because you know it's structure. You also can use python to programatically retrieve a list of those using the following code:

root = dom.documentElement
for node in root.childNodes:
    print(node.tagName)

This code should print you the tag name of all the nodes under the root element of your document (the first one containing all the others).

Sign up to request clarification or add additional context in comments.

1 Comment

Cool, thanks. But how do I know what tags are present in the XML file? Isn't this meta-data? How would I acquire this information?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.