Writing exotic (non-ascii) characters to Oracle DB using pandas.to_sql in Python 3.6

Question

I'm having difficulty writing values from a pandas.DataFrame which contain non-ASCII characters to an Oracle data base. Here is a reproducible example (given an real connection string):

import pandas as pd
from sqlalchemy import create_engine, Unicode, NVARCHAR

connection_string = oracle://<name>:<password>@<database>'

df = pd.DataFrame([
        ['Société Générale']
    ], columns=['firm'])

conn = create_engine(connection_string, encoding='utf-8')
dtypes = {'firm': Unicode(40)}

df.to_sql('test', con=connection_string, dtype=dtypes, if_exists='replace')

The error produced looks like

UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 4: ordinal not in range(128)

I guess the question is how do I get it to use UTF-8 encoding when writing. I know the default value for encoding in create_engine is utf-8' and I thought it would control the encoding used. I've also tried dtypes = {'firm': NVARCHAR(40, convert_unicode=True)} but get the same error.

I tried encoding the data before writing (df['firm'] = df.firms.str.encode('utf-8')) which does get around this problem only to lead to bigger problems.

This seems like a straightforward problem but I've spent hours looking at the docs and SO and can't figure out what to do.

Versions used are; Python: 3.6, pandas: 0.20, sqlalchemy: 1.11

What do you get when you query the data directly via SQL? Try also DUMP(firm, 1016) — Wernfried Domscheit
– Wernfried Domscheit, Commented Jul 24, 2017 at 14:35
Thanks for the reply. I'm actually creating a table from scratch so there is nothing to query... — JoeCondron
– JoeCondron, Commented Jul 24, 2017 at 14:45
And what do you get when you query the table after you have insert something? Don't say"The data is deleted afterwards." - Then I would ask you: "Why do you store the data in database at all?" — Wernfried Domscheit
– Wernfried Domscheit, Commented Jul 24, 2017 at 15:02
And if you try print(sys.getdefaultencoding()) ? does it say utf-8? — Uvar
– Uvar, Commented Jul 24, 2017 at 16:34

mbednarski · Accepted Answer · 2018-01-14 12:32:21Z

4

It is old question but I have struggled with same issue recently, and found a solution that worked for me.

I had to set

os.environ['NLS_LANG'] = ".AL32UTF8"

And it worked for me. However I found that inserting data is very slow.

answered Jan 14, 2018 at 12:32

mbednarski

7982 gold badges9 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Writing exotic (non-ascii) characters to Oracle DB using pandas.to_sql in Python 3.6

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related