3

I have been trying to save the data as a excel file as a type of
CSV UTF-8 (Comma delimited) (*.csv) which is different then the normal
CSV (Comma delimited) (*.csv) file. It display the unicode text when opened in excel. I can save as that file easily from excel but from python i am only able to save it as normal csv. Which will not cause loss of data but when opened it shows this kind of text "à¤à¤‰à¤Ÿà¤¾" instead of "एउटा" this text.

If I copied the text opening it with notepad to the excel file and then manually save the file as CSV UTF-8 then it preserves the correct display. But doing so is time consuming since all values appear in same line in notepad and i have to separate it in excel file. So i just want to know how can i save data as CSV UTF-8 format of excel using python.

I have tried the follwing code but it results in normal csv file.

import codecs
import unicodecsv as csv

input_text = codecs.open('input.txt', encoding='utf-8')
all_text = input_text.read()
text_list = all_text.split()

output_list = [['Words','Tags']]
for input_word in text_list:
    word_tag_list = [input_word,'O']
    output_list.append(word_tag_list)

with codecs.open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(output_list)

2 Answers 2

4

You need to indicate to Excel that this is a UTF-8 file. Unfortunately the only way to do this is by prepending a special byte sequence to the front of the file. Python will do this automatically if you use a special encoding.

with codecs.open("output.csv", "w", encoding="utf_8_sig") as f:
Sign up to request clarification or add additional context in comments.

2 Comments

with codecs.open("output.csv", "w", encoding="utf_8_sig") as f: this result in TypeError:utf_8_encode() argument 1 must be str, not bytes
Yeah I have done that, same error when encoding="utf_8_sig" is written
0

I have found the answer. The encoding="utf_8_sig" should be given to csv.writer method to write the excel file as CSV UTF-8 file. Previous code can be witten as:

with open("output.csv", "wb") as f:
    writer = csv.writer(f, dialect='excel', encoding='utf_8_sig')
    writer.writerows(output_list)

However there was problem when data has , at the end Eg: "भने," For this case i didn't need the comma so i removed it with following code within the for loop.

import re
if re.search(r'.,$',input_word):
    input_word = re.sub(',$','',input_word)

Finally I was able to obtain the output as desired with Unicode character correctly displayed and removing extra comma which is present at the end of data. So, if anyone know how to ignore comma at the end of data in excel file then you can comment here. Thanks.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.