1

I am reading data into pandas from an SQL Server 2014 12.0.4100 SP1 database. The data is stored in the Windows-1252 encoding.

I am using python 2.7.

I want to output the resulting dataframe to Excel or csv. Specifically:

import pyodbc
cnxn = pyodbc.connect(r'Driver={SQL Server};Server=.\my_server;Database=my_db;Trusted_Connection=yes;')
sql = "select * from my_table"
df = pd.read_sql(sql, cnxn)
df.to_csv("my_csv.csv", encoding="utf-8")

However, this fails with the error message:

UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 13966: invalid start byte

What do I need to do to successfully export to a utf-8 csv?

2 Answers 2

2

The solution is to convert any columns with non-ASCII characters to UTF-8 explicitly.

You can do this using the following code:

def convert(my_str):
    return my_str.decode('Windows-1252').encode('utf-8')
df["Name"] = df["Name"].apply(convert)

Once converted, you will be able to write to .csv and Excel format without problems.

Sign up to request clarification or add additional context in comments.

Comments

0

Did you try to use 'ISO-8859-2' as the encoding ?

df.to_csv("my_csv.csv", encoding="ISO-8859-2") 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.