I'm trying to encode csv file to utf8 using python

Question

I'm using python to read and encode many files to utf8 using python,I try it with the code below:

import os
from os import listdir

def find_csv_filenames(path_to_dir, suffix=".csv" ):
    path_to_dir = os.path.normpath(path_to_dir)
    filenames = listdir(path_to_dir)
#Check *csv directory

    fp = lambda f: not os.path.isdir(path_to_dir+"/"+f) and f.endswith(suffix)
    return [path_to_dir+"/"+fname for fname in filenames if fp(fname)]

def convert_files(files, ascii, to="utf-8"):
    count = 0
    lineno = 0
    for name in files:
        lineno = lineno+1
        with open(name) as f:
            file_target = open(name, mode='r', encoding='latin-1')
            file_content = file_target.read()
            file_target.close

        print(lineno)
        file_source = open("./csv/data{}.csv".format(lineno), mode='w', encoding='utf-8')
        file_source.write(file_content) 

csv_files = find_csv_filenames('./csv', ".csv")
convert_files(csv_files, "cp866")

The problem is that after I read and write data to other files and set encode it to utf8 but it still not work.

file_target is opened with encoding='latin-1'? Is that a mistake? — user2363448
– user2363448, Commented Dec 13, 2013 at 5:44
related: Read many csv file and write it to encoding to utf8 using python — jfs
– jfs, Commented Dec 13, 2013 at 5:57
You read encoding='latin-1' and write encoding='utf-8'. Did you intend to read cp866 instead? Then it should be easy to see where the problem is. — Janne Karila
– Janne Karila, Commented Dec 13, 2013 at 8:09
I already try with cp866 and latin-1 but still not work,after run file I get new csv file when I open it I still see Encode in ANSI in notpad++,it the same the original file that I try to convert. — user3098171
– user3098171, Commented Dec 13, 2013 at 8:17

flyer · Accepted Answer · 2013-12-13 06:55:27Z

0

Before you open a file which encoding is not clear, you could use chardet to detect the file's encoding rather than use a encoding guessed to open a file. Usage is like this:

>>> import chardet
>>> encoding = chardet.detect('PATH/TO/FILE')['encoding']

And then open the file with the encoding detected and write the contents into a file opened with 'utf-8' encoding.

If you're not sure whether the file is converted using 'utf-8' encoding, you could use enca to see if the encoding of the file is 'ASCII' or 'utf-8' like this in Linux shell:

$ enca FILENAME

answered Dec 13, 2013 at 6:55

flyer

9,91613 gold badges50 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

I'm trying to encode csv file to utf8 using python

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related