Parsing XML File with Python, while extracting Attributes and Children

Question

I'm trying to read an XML file in Python whose general format is as follows:

<item id="1149" num="1" type="topic">
    <title>Afghanistan</title>
    <additionalInfo>Afghanistan</additionalInfo>
</item>

(This snippet repeats many times.)

I'm trying to get the id value and the title value to be printed into a file. Currently, I'm having trouble with getting the XML file into Python. Currently, I'm doing this to get the XML file:

import xml.etree.ElementTree as ET
from urllib2 import urlopen

url = 'http://api.npr.org/list?id=3002' #1007 is science
response = urlopen(url)
f = open('out.xml', 'w')
f.write(response)

However, whenever I run this code, I get the error Traceback (most recent call last): File "python", line 9, in <module> TypeError: expected a character buffer object, which makes me think that I'm not using something that can handle XML. Is there any way that I can save the XML file to a file, then extract the title of each section, as well as the id attribute associated with that title? Thanks for the help.

Reza-S4 · Accepted Answer · 2014-07-22 19:48:37Z

1

You can read the content of response by this code :

import urllib2
opener = urllib2.build_opener(urllib2.HTTPRedirectHandler(),urllib2.HTTPCookieProcessor())
response= opener.open("http://api.npr.org/list?id=3002").read()
opener.close()

and then write it to file :

f = open('out.xml', 'w')
f.write(response)
f.close()

answered Jul 22, 2014 at 19:48

Reza-S4

1,0421 gold badge18 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Jack · Accepted Answer · 2014-07-22 19:47:43Z

What you want is response.read() not response. The response variable is an instance not the xml string. By doing response.read() it will read the xml from the response instance.

You can then write it directly to a file like so:

url = 'http://api.npr.org/list?id=3002' #1007 is science
response = urlopen(url)
f = open('out.xml', 'w')
f.write(response.read())

Alternatively you could also parse it directly into the ElementTree like so:

url = 'http://api.npr.org/list?id=3002' #1007 is science
response = urlopen(url)
tree = ET.fromstring(response.read())

To extract all of the id/title pairs you could do the following as well:

url = 'http://api.npr.org/list?id=3002' #1007 is science
response = urlopen(url)
tree = ET.fromstring(response.read())
for item in tree.findall("item"):
    print item.get("id")
    print item.find("title").text

From there you can decide where to store/output the values

Collectives™ on Stack Overflow

Parsing XML File with Python, while extracting Attributes and Children

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related