0

I am trying to convert an XML file to CSV, but the encoding of the XML ("ISO-8859-1") apparently contains characters that are not in the ascii codec which Python uses to write rows.

I get the error:

Traceback (most recent call last):
  File "convert_folder_to_csv_PLAYER.py", line 139, in <module>
    xml2csv_PLAYER(filename)
  File "convert_folder_to_csv_PLAYER.py", line 121, in xml2csv_PLAYER
    fout.writerow(row)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 4: ordinal not in range(128)

I have tried opening the file as follows: dom1 = parse(input_filename.encode( "utf-8" ) )

and I have tried replacing the \xe1 character in each row before it is written. Any suggestions?

1

1 Answer 1

1

The xml parser returns unicode objects. That's a good thing. Thing is, csv module can't deal with them.

You could encode each unicode string returned by the xml parser before handing to the csv writer, but a better idea is to use this csv UnicodeWriter recipe from the official docs of the csv module:

import csv, codecs, cStringIO

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your answer -- I struggled with the UnicodeWriter class because I am new to Python and had some trouble with the stream, but I followed your first suggestion and fixed my problem with one line of code: [code]fout.writerow([s.encode("utf-8") for s in row])[/code]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.