I'm having difficulty writing values from a pandas.DataFrame which contain non-ASCII characters to an Oracle data base. Here is a reproducible example (given an real connection string):
import pandas as pd
from sqlalchemy import create_engine, Unicode, NVARCHAR
connection_string = oracle://<name>:<password>@<database>'
df = pd.DataFrame([
['Société Générale']
], columns=['firm'])
conn = create_engine(connection_string, encoding='utf-8')
dtypes = {'firm': Unicode(40)}
df.to_sql('test', con=connection_string, dtype=dtypes, if_exists='replace')
The error produced looks like
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 4: ordinal not in range(128)
I guess the question is how do I get it to use UTF-8 encoding when writing. I know the default value for encoding in create_engine is utf-8' and I thought it would control the encoding used.
I've also tried dtypes = {'firm': NVARCHAR(40, convert_unicode=True)} but get the same error.
I tried encoding the data before writing (df['firm'] = df.firms.str.encode('utf-8')) which does get around this problem only to lead to bigger problems.
This seems like a straightforward problem but I've spent hours looking at the docs and SO and can't figure out what to do.
Versions used are; Python: 3.6, pandas: 0.20, sqlalchemy: 1.11
DUMP(firm, 1016)