Python 2.7 - Pandas UnicodeEncodeError with data from pyodbc

Question

I'm trying to pull data from SQL Server using pyodbc and load it into a dataframe, then export it to an HTML file, except I keep receiving the following Unicode error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 15500: ordinal not in range(128)

Here is my current setup (encoding instructions per docs):

cnxn =  pyodbc.connect('DSN=Planning;UID=USER;PWD=PASSWORD;')
cnxn.setdecoding(pyodbc.SQL_CHAR, encoding='cp1252', to=unicode)
cnxn.setdecoding(pyodbc.SQL_WCHAR, encoding='cp1252', to=unicode)
cnxn.setdecoding(pyodbc.SQL_WMETADATA, encoding='cp1252', to=unicode)
cnxn.setencoding(str, encoding='utf-8')
cnxn.setencoding(unicode, encoding='utf-8')
cursor = cnxn.cursor()

with open('Initial Dataset.sql') as f:
    initial_query = f.read()

cursor.execute(initial_query)
columns = [column[0] for column in cursor.description]
initial_data = cursor.fetchall()
i_df = pd.DataFrame.from_records(initial_data, columns=columns)
i_df.to_html('initial.html')

An odd but useful point to note is that when I try to export a CSV:

i_df.to_csv('initial.csv')

I get the same error, however when I add:

i_df.to_csv('initial.csv', encoding='utf-8')

It works. Can someone help me understand this encoding issue?

Side note: I've also tried using a sqlalchemy connection and pandas.read_sql() and the same error persists.

The error means you are trying to encode an (Unicode) character not representable in ASCII to ASCII. I'm just guessing, but your data frame returned by pandas is encoded in utf-8. I suspect the to=unicode is wrong, but just a shot in the dark. — stephanmg
– stephanmg, Commented Nov 1, 2019 at 14:38
I understand what the error means, I just don't understand why it's occurring. The dataframe is utf-8 encoded. The docs for pandas.to_html are rather scant. Why would it try to convert to ASCII when generating the HTML? — Jon Behnken
– Jon Behnken, Commented Nov 1, 2019 at 14:44
I'm not sure, but I would check the pandas.to_html source code to see what's happening there (Maybe encoding defaults to ASCII, I dont know). — stephanmg
– stephanmg, Commented Nov 1, 2019 at 14:50
You shouldn't need any setencoding/setdecoding calls at all when working with SQL Server, especially not encoding to UTF-8, which SQL Server ODBC does not use (it uses UTF-16, and that is the default encoding for pyodbc). — Gord Thompson
– Gord Thompson, Commented Nov 1, 2019 at 14:53
From here: "SQL Server's recent drivers match the specification, so no configuration is necessary. Using the pyodbc defaults is recommended." — Gord Thompson
– Gord Thompson, Commented Nov 1, 2019 at 15:28

Jon Behnken · Accepted Answer · 2019-11-01 15:27:15Z

1

The second answer on this question seems to be an acceptable workaround, except for Python 2.x users, you must use io, so:

import io

html = df.to_html()
with io.open("mypage.html", "w", encoding="utf-8") as file:
    file.write(html)

It was not included in the latest release, but it looks like the next version of pandas will have an encoding option for to_html(), see docs (line 2228).

answered Nov 1, 2019 at 15:27

Jon Behnken

5601 gold badge3 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Gord Thompson Over a year ago

Yes, that's correct. The encoding should be applied to the output file, not the communications between pyodbc and the SQL Server.

Jon Behnken Over a year ago

The problem ultimately lies in pandas, as to_html() seems to enforce ASCII encoding. It appears they will be fixing that issue in an upcoming release.

Gord Thompson Over a year ago

"to_html() seems to enforce ASCII encoding" - No, more likely that to_html uses the default encoding for the file when you only pass it a string (filepath) for buf=, and the default string encoding for Python_2 is ASCII.

Jon Behnken Over a year ago

@GordThompson okay, but with Python 2 and no way to to tell the function to use a different encoding, that is practically the same thing, no?

Gord Thompson Over a year ago

The way to "tell the function" is to pass it a buf argument that is a StringIO-like object instead of just a (string) path.

Collectives™ on Stack Overflow

Python 2.7 - Pandas UnicodeEncodeError with data from pyodbc

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related