0

I'm using python to read and encode many files to utf8 using python,I try it with the code below:

import os
from os import listdir

def find_csv_filenames(path_to_dir, suffix=".csv" ):
    path_to_dir = os.path.normpath(path_to_dir)
    filenames = listdir(path_to_dir)
#Check *csv directory

    fp = lambda f: not os.path.isdir(path_to_dir+"/"+f) and f.endswith(suffix)
    return [path_to_dir+"/"+fname for fname in filenames if fp(fname)]

def convert_files(files, ascii, to="utf-8"):
    count = 0
    lineno = 0
    for name in files:
        lineno = lineno+1
        with open(name) as f:
            file_target = open(name, mode='r', encoding='latin-1')
            file_content = file_target.read()
            file_target.close

        print(lineno)
        file_source = open("./csv/data{}.csv".format(lineno), mode='w', encoding='utf-8')
        file_source.write(file_content) 

csv_files = find_csv_filenames('./csv', ".csv")
convert_files(csv_files, "cp866") 

The problem is that after I read and write data to other files and set encode it to utf8 but it still not work.

8
  • file_target is opened with encoding='latin-1'? Is that a mistake? Commented Dec 13, 2013 at 5:44
  • related: Read many csv file and write it to encoding to utf8 using python Commented Dec 13, 2013 at 5:57
  • stackoverflow.com/questions/20537981/… Commented Dec 13, 2013 at 6:57
  • You read encoding='latin-1' and write encoding='utf-8'. Did you intend to read cp866 instead? Then it should be easy to see where the problem is. Commented Dec 13, 2013 at 8:09
  • I already try with cp866 and latin-1 but still not work,after run file I get new csv file when I open it I still see Encode in ANSI in notpad++,it the same the original file that I try to convert. Commented Dec 13, 2013 at 8:17

1 Answer 1

0

Before you open a file which encoding is not clear, you could use chardet to detect the file's encoding rather than use a encoding guessed to open a file. Usage is like this:

>>> import chardet
>>> encoding = chardet.detect('PATH/TO/FILE')['encoding']

And then open the file with the encoding detected and write the contents into a file opened with 'utf-8' encoding.

If you're not sure whether the file is converted using 'utf-8' encoding, you could use enca to see if the encoding of the file is 'ASCII' or 'utf-8' like this in Linux shell:

$ enca FILENAME
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.