0

I need to convert multiple CSV files (with different encodings) into UTF-8.

Here is my code:

#find encoding and if not in UTF-8 convert it

import os
import sys
import glob
import chardet
import codecs

myFiles = glob.glob('/mypath/*.csv')

csv_encoding = []

for file in myFiles:
  with open(file, 'rb') as opened_file:
     bytes_file=opened_file.read()
     result=chardet.detect(bytes_file)
     my_encoding=result['encoding']
     csv_encoding.append(my_encoding)
        
print(csv_encoding)

for file in myFiles:
  if csv_encoding in ['utf-8', 'ascii']:
    print(file + ' in utf-8 encoding')
  else:
    with codecs.open(file, 'r') as file_for_conversion:
      read_file_for_conversion = file_for_conversion.read()
    with codecs.open(file, 'w', 'utf-8') as converted_file:
       converted_file.write(read_file_for_conversion)
    print(file +' converted to utf-8')

When I try to run this code I get the following error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 5057: invalid continuation byte

Can someone help me? Thanks!!!

5
  • Does this answer your question? How to fix: "UnicodeDecodeError: 'ascii' codec can't decode byte" Commented Jun 21, 2020 at 16:56
  • 1
    my_encoding in your second for-loop always has the last value from the first for-loop, which is unlikely to be correct. Commented Jun 21, 2020 at 16:59
  • Well, when you read the file, specify the encoding. Commented Jun 21, 2020 at 17:30
  • The problem is I have about 20 csv files with different encodings that I need to convert to utf-8 weekly in order to work with them. My idea is to automate this process. Commented Jun 21, 2020 at 17:45
  • @aline - Did lenz's response help? If so , please be sure to "upvote" and "accept" it. Otherwise, please update your post with what additional things you've tried, and where you're blocked. Commented Jun 24, 2020 at 16:22

1 Answer 1

1

You need to zip the lists myFiles and csv_encoding to get their values aligned:

for file, encoding in zip(myFiles, csv_encoding):
    ...

And you need to specify that value in the open() call:

    ...
    with codecs.open(file, 'r', encoding=encoding) as file_for_conversion:

Note: in Python 3 there's no need to use the codecs module for opening files. Just use the built-in open function and specify the encoding with the encoding parameter.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.