Convert all csv files from encodeing ansi to utf8 using python

Question

I have python code as below:

import os
from os import listdir

def find_csv_filenames( path_to_dir, suffix=".csv" ):
    filenames = listdir(path_to_dir)
    return [ filename for filename in filenames if filename.endswith( suffix ) ]
    #always got the error this below code
filenames = find_csv_filenames('C:\casperjs\project\teleservices\csv')
for name in filenames:
    print name

I meet the error :

filenames = find_csv_filenames('C:\casperjs\project\teleservices\csv')
Error message: `TabError: inconsistent use of tabs and spaces in indentation`

What I need : I want to read all csv files and convert it from encoding ansi to utf8 but the code above is only read path of each csv files. I don't know what's wrong with it?

Format your code and post full error message, please.

graphite
– graphite

2013-12-12 08:12:41 +00:00
Commented Dec 12, 2013 at 8:12 — graphite
– graphite, Commented Dec 12, 2013 at 8:12
ok thanks now I already show you the error message.

user3024562
– user3024562

2013-12-12 08:22:25 +00:00
Commented Dec 12, 2013 at 8:22 — user3024562
– user3024562, Commented Dec 12, 2013 at 8:22
You should fix indentation at first.

graphite
– graphite

2013-12-12 08:30:29 +00:00
Commented Dec 12, 2013 at 8:30 — graphite
– graphite, Commented Dec 12, 2013 at 8:30
Does the error gone after formatting?

graphite
– graphite

2013-12-12 08:51:49 +00:00
Commented Dec 12, 2013 at 8:51 — graphite
– graphite, Commented Dec 12, 2013 at 8:51

Michael Kazarian · Accepted Answer · 2013-12-12 09:30:16Z

1

Below will convert each line in ascii-file:

import os
from os import listdir

def find_csv_filenames(path_to_dir, suffix=".csv" ):
    path_to_dir = os.path.normpath(path_to_dir)
    filenames = listdir(path_to_dir)
    #Check *csv directory
    fp = lambda f: not os.path.isdir(path_to_dir+"/"+f) and f.endswith(suffix)
    return [path_to_dir+"/"+fname for fname in filenames if fp(fname)]

def convert_files(files, ascii, to="utf-8"):
    for name in files:
        print "Convert {0} from {1} to {2}".format(name, ascii, to)
        with open(name) as f:
            for line in f.readlines():
                pass
                print unicode(line, "cp866").encode("utf-8")    

csv_files = find_csv_filenames('/path/to/csv/dir', ".csv")
convert_files(csv_files, "cp866") #cp866 is my ascii coding. Replace with your coding.

edited Dec 12, 2013 at 9:30

answered Dec 12, 2013 at 8:43

Michael Kazarian

4,4921 gold badge23 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Francesco Gramano · Accepted Answer · 2013-12-12 08:29:35Z

0

Refer to documentation: http://docs.python.org/2/howto/unicode.html

If you need a string, say it is stored as s, that you want to encode as a specific format, you use s.encode()

answered Dec 12, 2013 at 8:29

Francesco Gramano

3645 silver badges12 bronze badges

Comments

luc · Accepted Answer · 2013-12-12 08:54:22Z

0

Your code is just listing csv files. It doesn't do anything with it. If you need to read it, you can use the csv module. If you need to manage encoding, you can do something like this:

import csv, codecs
def safe_csv_reader(the_file, encoding, dialect=csv.excel, **kwargs):
    csv_reader = csv.reader(the_file, dialect=dialect, **kwargs)
    for row in csv_reader:
        yield [codecs.decode(cell, encoding) for cell in row]

reader = safe_csv_reader(csv_file, "utf-8", delimiter=',')
for row in reader:
    print row

answered Dec 12, 2013 at 8:54

luc

43.4k25 gold badges132 silver badges173 bronze badges

Collectives™ on Stack Overflow

Convert all csv files from encodeing ansi to utf8 using python

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related