0

Afternoon,

I am having some trouble with a SQLite to CSV python script. I have searched high and I have searched low for an answer but none have worked for me, or I am having a problem with my syntax.

I want to replace characters within the SQLite database which fall outside of the ASCII table (larger than 128).

Here is the script I have been using:

#!/opt/local/bin/python
import sqlite3
import csv, codecs, cStringIO

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f", 
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([unicode(s).encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

conn = sqlite3.connect('test.db')

c = conn.cursor()

# Select whichever rows you want in whatever order you like
c.execute('select ROWID, Name, Type, PID from PID')

writer = UnicodeWriter(open("ProductListing.csv", "wb"))

# Make sure the list of column headers you pass in are in the same order as your SELECT
writer.writerow(["ROWID", "Product Name", "Product Type", "PID", ])
writer.writerows(c)

I have tried to add the 'replace' as indicated here but have got the same error. Python: Convert Unicode to ASCII without errors for CSV file

The error is the UnicodeDecodeError.

Traceback (most recent call last):
  File "SQLite2CSV1.py", line 53, in <module>
    writer.writerows(c)
  File "SQLite2CSV1.py", line 32, in writerows
    self.writerow(row)
  File "SQLite2CSV1.py", line 19, in writerow
    self.writer.writerow([unicode(s).encode("utf-8") for s in row])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 65: ordinal not in range(128)

Obviously I want the code to be robust enough that if it encounters characters outside of these bounds that it replaces it with a character such as '?' (\x3f).

Is there a way to do this within the UnicodeWriter class? And a way I can make the code robust that it won't produce these errors.

Your help is greatly appreciated.

2 Answers 2

1

If you just want to write an ASCII CSV, simply use the stock csv.writer(). To ensure that all values passed are indeed ASCII, use encode('ascii', errors='replace').

Example:

import csv

rows = [
  [u'some', u'other', u'more'],
  [u'umlaut:\u00fd', u'euro sign:\u20ac', '']
]

with open('/tmp/test.csv', 'wb') as csvFile:
    writer = csv.writer(csvFile)
    for row in rows:
        asciifiedRow = [item.encode('ascii', errors='replace') for item in row]
        print '%r --> %r' % (row, asciifiedRow)
        writer.writerow(asciifiedRow)

The console output for this is:

[u'some', u'other', u'more'] --> ['some', 'other', 'more']
[u'umlaut:\xfd', u'euro sign:\u20ac', ''] --> ['umlaut:?', 'euro sign:?', '']

The resulting CSV file contains:

some,other,more
umlaut:?,euro sign:?,
Sign up to request clarification or add additional context in comments.

1 Comment

+1 spot on. Also note that UTF-8 isn't ASCII, so attempting to feed a UTF-8 string to a function that expects ASCII will usually have hilarious, unintended results (of which a UnicodeEncodeError is the most obvious - some effects are more subtle.)
0

With access to a unix environment, here's what worked for me

sqlite3.exe a.db .dump > a.sql;
tr -d "[\\200-\\377]" < a.sql > clean.sql;
sqlite3.exe clean.db < clean.sql;

(It's not a python solution, but maybe it will help someone else due to its brevity. This solution STRIPS OUT all non ascii characters, doesn't try to replace them.)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.