0

I'm trying to read an XML file in Python whose general format is as follows:

<item id="1149" num="1" type="topic">
    <title>Afghanistan</title>
    <additionalInfo>Afghanistan</additionalInfo>
</item>

(This snippet repeats many times.)

I'm trying to get the id value and the title value to be printed into a file. Currently, I'm having trouble with getting the XML file into Python. Currently, I'm doing this to get the XML file:

import xml.etree.ElementTree as ET
from urllib2 import urlopen

url = 'http://api.npr.org/list?id=3002' #1007 is science
response = urlopen(url)
f = open('out.xml', 'w')
f.write(response)

However, whenever I run this code, I get the error Traceback (most recent call last): File "python", line 9, in <module> TypeError: expected a character buffer object, which makes me think that I'm not using something that can handle XML. Is there any way that I can save the XML file to a file, then extract the title of each section, as well as the id attribute associated with that title? Thanks for the help.

2 Answers 2

1

You can read the content of response by this code :

import urllib2
opener = urllib2.build_opener(urllib2.HTTPRedirectHandler(),urllib2.HTTPCookieProcessor())
response= opener.open("http://api.npr.org/list?id=3002").read()
opener.close()

and then write it to file :

f = open('out.xml', 'w')
f.write(response)
f.close()
Sign up to request clarification or add additional context in comments.

Comments

0

What you want is response.read() not response. The response variable is an instance not the xml string. By doing response.read() it will read the xml from the response instance.

You can then write it directly to a file like so:

url = 'http://api.npr.org/list?id=3002' #1007 is science
response = urlopen(url)
f = open('out.xml', 'w')
f.write(response.read())

Alternatively you could also parse it directly into the ElementTree like so:

url = 'http://api.npr.org/list?id=3002' #1007 is science
response = urlopen(url)
tree = ET.fromstring(response.read())

To extract all of the id/title pairs you could do the following as well:

url = 'http://api.npr.org/list?id=3002' #1007 is science
response = urlopen(url)
tree = ET.fromstring(response.read())
for item in tree.findall("item"):
    print item.get("id")
    print item.find("title").text

From there you can decide where to store/output the values

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.