I was running the following code in Python:
import xml.etree.ElementTree as ET
tree = ET.parse('dplp_11.xml')
root = tree.getroot()
f = open('workfile', 'w')
for country in root.findall('article'):
rank = country.find('year').text
name = country.find('title').text
if(int(rank)>2009):
f.write(name)
auth = country.findall('author')
for a in auth:
#print str(a)
f.write(a.text)
f.write(',')
f.write('\n')
I got an error:
Traceback (most recent call last):
File "parser.py", line 14, in <module>
f.write(a.text)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 4: ordinal not in range(128)
I was trying to parse the dblp data which looks like this:
<?xml version="1.0"?>
<dblp>
<article mdate="2011-01-11" key="journals/acta/Saxena96">
<author>Sanjeev Saxena</author>
<title>Parallel Integer Sorting and Simulation Amongst CRCW Models.</title>
<pages>607-619</pages>
<year>1996</year>
<volume>33</volume>
<journal>Acta Inf.</journal>
<number>7</number>
<url>db/journals/acta/acta33.html#Saxena96</url>
<ee>http://dx.doi.org/10.1007/BF03036466</ee>
</article>
<article mdate="2015-07-14" key="journals/acta/BozapalidisFR12">
<author>Symeon Bozapalidis</author>
<author>Zoltán Fülöp 0001</author>
<author>George Rahonis</author>
<title>Equational weighted tree transformations.</title>
<pages>29-52</pages>
<year>2012</year>
<volume>49</volume>
<journal>Acta Inf.</journal>
<number>1</number>
<ee>http://dx.doi.org/10.1007/s00236-011-0148-5</ee>
<url>db/journals/acta/acta49.html#BozapalidisFR12</url>
</article>
</dblp>
How can I resolve it?
f.write()line that throws the exception. It is not the XML parsing that is the issue here, it is writing to the text file that causes the problem.f.write(u'Zolt\xe1n')would give you the exact same error.