Ascii Code error while converting from xlsx to csv

Question

I have referred some post related to unicode error but didn't get any solution for my problem. I am converting xlsx to csv fom a workbook of 6 sheets. Use the following code

def csv_from_excel(file_loc):

    #file_acess check
    print os.access(file_loc, os.R_OK)
    wb = xlrd.open_workbook(file_loc)
    print wb.nsheets

    sheet_names = wb.sheet_names()
    print sheet_names
    counter = 0

    while counter < wb.nsheets:
        try:
            sh = wb.sheet_by_name(sheet_names[counter])
            file_name = str(sheet_names[counter]) + '.csv'
            print file_name
            fh = open(file_name, 'wb')
            wr = csv.writer(fh, quoting=csv.QUOTE_ALL)

            for rownum in xrange(sh.nrows):
                wr.writerow(sh.row_values(rownum))

        except Exception as e:
            print str(e)

        finally:
            fh.close()
            counter += 1

I get an error in 4th sheet

'ascii' codec can't encode character u'\u2018' in position 0: ordinal not in range(128)"

but position 0 is blank and it has converted to csv till 33rd row.

I am unable to figure out. CSV was easy way to read content and put in my data structure .

Martijn Pieters · Accepted Answer · 2014-08-20 07:37:31Z

1

You'll need to manually encode Unicode values to bytes; for CSV usually UTF-8 is fine:

for rownum in xrange(sh.nrows):
    wr.writerow([unicode(c).encode('utf8') for c in sh.row_values(rownum)])

Here I use unicode() for column data that is not text.

The character you encountered is the U+2018 LEFT SINGLE QUOTATION MARK, which is just a fancy form of the ' single quote. Office software (spreadsheets, word processors, etc.) often auto-replace single and double quotes with the 'fancy' versions. You could also just replace those with ASCII equivalents. You can do that with the Unidecode package:

from unidecode import unidecode

for rownum in xrange(sh.nrows):
    wr.writerow([unidecode(unicode(c)) for c in sh.row_values(rownum)])

Use this when non-ASCII codepoints are only used for quotes and dashes and other punctuation.

answered Aug 20, 2014 at 7:37

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Nijesh Over a year ago

Thanks a lot @martijn-pieters . The first example and direct encoding to utf-8 seems to work.. Is using Unidecode is the fool proof way ? Why does this happen. can't we exclusivily declare the coding standard for one whole file ?

Martijn Pieters Over a year ago

@nij_wiz: The CSV module in Python 2 cannot handle Unicode; it was written well ahead of Unicode support in Python. This has been fixed in Python 3. Unidecode is a pragmatic method to ensure the data only uses ASCII codepoints by replacing any non-ASCII text with ASCII equivalents if available. It depends on your exact data if this is fool-proof.

Nijesh Over a year ago

@martijin : Yup.. I am using 2.7 So this problem .. I am using a lot of third party libraries which are yet to be ported on python 3.i use 3.4 for networking and other jobs. Thanks for your valuable input

Collectives™ on Stack Overflow

Ascii Code error while converting from xlsx to csv

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related