5

I'm having difficulty writing values from a pandas.DataFrame which contain non-ASCII characters to an Oracle data base. Here is a reproducible example (given an real connection string):

import pandas as pd
from sqlalchemy import create_engine, Unicode, NVARCHAR

connection_string = oracle://<name>:<password>@<database>'

df = pd.DataFrame([
        ['Société Générale']
    ], columns=['firm'])

conn = create_engine(connection_string, encoding='utf-8')
dtypes = {'firm': Unicode(40)}

df.to_sql('test', con=connection_string, dtype=dtypes, if_exists='replace')

The error produced looks like

UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 4: ordinal not in range(128)

I guess the question is how do I get it to use UTF-8 encoding when writing. I know the default value for encoding in create_engine is utf-8' and I thought it would control the encoding used. I've also tried dtypes = {'firm': NVARCHAR(40, convert_unicode=True)} but get the same error.

I tried encoding the data before writing (df['firm'] = df.firms.str.encode('utf-8')) which does get around this problem only to lead to bigger problems.

This seems like a straightforward problem but I've spent hours looking at the docs and SO and can't figure out what to do.

Versions used are; Python: 3.6, pandas: 0.20, sqlalchemy: 1.11

8
  • 1
    What do you get when you query the data directly via SQL? Try also DUMP(firm, 1016) Commented Jul 24, 2017 at 14:35
  • Thanks for the reply. I'm actually creating a table from scratch so there is nothing to query... Commented Jul 24, 2017 at 14:45
  • And what do you get when you query the table after you have insert something? Don't say"The data is deleted afterwards." - Then I would ask you: "Why do you store the data in database at all?" Commented Jul 24, 2017 at 15:02
  • And if you try print(sys.getdefaultencoding()) ? does it say utf-8? Commented Jul 24, 2017 at 16:34
  • @Uvar Yes, utf-8 is the default encoding. Commented Jul 25, 2017 at 8:55

1 Answer 1

4

It is old question but I have struggled with same issue recently, and found a solution that worked for me.

I had to set

os.environ['NLS_LANG'] = ".AL32UTF8"

And it worked for me. However I found that inserting data is very slow.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.