1

I'm trying to write a table from a .csv file with Hebrew text in it to an sql server database.
the table is valid and pandas reads the data correct (even displays the hebrew properly in pycharm),
but when i try to write it to a table in the database i get question marks ("???") where the Hebrew should be.

this is what i've tried, using pandas and sqlalchemy:

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine('mssql+pymssql://server/test?charset=utf8')
connection = engine.connect()

df = pd.read_csv("temp.csv", low_memory=False, encoding="UTF-8")
table_name = "test"
df.to_sql(table_name, connection, index=False, if_exists="append")

this loads the table properly but fails to write the Hebrew,
any suggestions?

7
  • What is the datatype of your column? Commented Nov 25, 2018 at 13:58
  • its read into a string in python and written as text into sql server. Commented Nov 25, 2018 at 14:00
  • Yes, but what is the datatype of your column? "String" is not a SQL Server data type. Commented Nov 25, 2018 at 14:02
  • as i've answered TEXT in sql server. Commented Nov 25, 2018 at 14:03
  • text has been deprecated since 2005, and doesn't support non-ansi characters. The unicode equivalent would be ntext, however, that too is deprecated. Change your column's datatype to nvarchar(MAX) in you need to store 4000+ character values with non-ansi characters. Commented Nov 25, 2018 at 14:05

1 Answer 1

4

You need to change the datatype of your column. text is deprecated, and varchar(MAX) should be used instead, however, neither can store unicode characters. To store unicode characters you would need to use ntext, which is also deprecated; you need to use nvarchar(MAX).

To change your column's definition, you can use this pseudo-SQL (You'll need to replace the parts in braces ({}) with the appropriate object names):

ALTER TABLE {YourTable} ALTER COLUMN {YourColumn} nvarchar(MAX);

Edit: note, this will not restore any data lost in your column. Once a non-ANSI character is inserted into a varchar (or similar) datatype the data is immediately lost and cannot be recovered apart from by changing the datatype and reentry.

Sign up to request clarification or add additional context in comments.

8 Comments

Sorry about the long accept time, took me some time till I could verify.
For SQL Server how it should be? i'm facing the same problem
This answer is for SQL Server @M.Mariscal .
can you help me with this question please? stackoverflow.com/questions/61813596/…
My skills on Python(3) are beginner at best, @M.Mariscal .
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.