0

I'm trying to pull data from SQL Server using pyodbc and load it into a dataframe, then export it to an HTML file, except I keep receiving the following Unicode error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 15500: ordinal not in range(128)

Here is my current setup (encoding instructions per docs):

cnxn =  pyodbc.connect('DSN=Planning;UID=USER;PWD=PASSWORD;')
cnxn.setdecoding(pyodbc.SQL_CHAR, encoding='cp1252', to=unicode)
cnxn.setdecoding(pyodbc.SQL_WCHAR, encoding='cp1252', to=unicode)
cnxn.setdecoding(pyodbc.SQL_WMETADATA, encoding='cp1252', to=unicode)
cnxn.setencoding(str, encoding='utf-8')
cnxn.setencoding(unicode, encoding='utf-8')
cursor = cnxn.cursor()

with open('Initial Dataset.sql') as f:
    initial_query = f.read()

cursor.execute(initial_query)
columns = [column[0] for column in cursor.description]
initial_data = cursor.fetchall()
i_df = pd.DataFrame.from_records(initial_data, columns=columns)
i_df.to_html('initial.html')

An odd but useful point to note is that when I try to export a CSV:

i_df.to_csv('initial.csv')

I get the same error, however when I add:

i_df.to_csv('initial.csv', encoding='utf-8')

It works. Can someone help me understand this encoding issue?

Side note: I've also tried using a sqlalchemy connection and pandas.read_sql() and the same error persists.

7
  • The error means you are trying to encode an (Unicode) character not representable in ASCII to ASCII. I'm just guessing, but your data frame returned by pandas is encoded in utf-8. I suspect the to=unicode is wrong, but just a shot in the dark. Commented Nov 1, 2019 at 14:38
  • I understand what the error means, I just don't understand why it's occurring. The dataframe is utf-8 encoded. The docs for pandas.to_html are rather scant. Why would it try to convert to ASCII when generating the HTML? Commented Nov 1, 2019 at 14:44
  • I'm not sure, but I would check the pandas.to_html source code to see what's happening there (Maybe encoding defaults to ASCII, I dont know). Commented Nov 1, 2019 at 14:50
  • 1
    You shouldn't need any setencoding/setdecoding calls at all when working with SQL Server, especially not encoding to UTF-8, which SQL Server ODBC does not use (it uses UTF-16, and that is the default encoding for pyodbc). Commented Nov 1, 2019 at 14:53
  • 1
    From here: "SQL Server's recent drivers match the specification, so no configuration is necessary. Using the pyodbc defaults is recommended." Commented Nov 1, 2019 at 15:28

1 Answer 1

1

The second answer on this question seems to be an acceptable workaround, except for Python 2.x users, you must use io, so:

import io

html = df.to_html()
with io.open("mypage.html", "w", encoding="utf-8") as file:
    file.write(html)

It was not included in the latest release, but it looks like the next version of pandas will have an encoding option for to_html(), see docs (line 2228).

Sign up to request clarification or add additional context in comments.

5 Comments

Yes, that's correct. The encoding should be applied to the output file, not the communications between pyodbc and the SQL Server.
The problem ultimately lies in pandas, as to_html() seems to enforce ASCII encoding. It appears they will be fixing that issue in an upcoming release.
"to_html() seems to enforce ASCII encoding" - No, more likely that to_html uses the default encoding for the file when you only pass it a string (filepath) for buf=, and the default string encoding for Python_2 is ASCII.
@GordThompson okay, but with Python 2 and no way to to tell the function to use a different encoding, that is practically the same thing, no?
The way to "tell the function" is to pass it a buf argument that is a StringIO-like object instead of just a (string) path.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.