Data truncation when using pandas to_sql into VARCHAR(MAX) SQL column

Ask Question

Asked 6 years, 9 months ago

Modified 5 years, 7 months ago

Viewed 2k times

Problem:

When importing dataframe to SQL table, getting a truncation error:

sqlalchemy.exc.DataError: (pyodbc.DataError) ('22001', '[22001] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]String or binary data would be truncated. (8152) (SQLExecDirectW); [22001] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]The statement has been terminated. (3621)')

The column in question is ProjectDescription (see column list below). I've quoted the line that produces the error. I'm not entirely sure this is the source of the issue, but I think the column was originally created as VARCHAR(300). The value causing trouble is 307 characters long; 9 truncated as below. That leaves 298. Original size limit 300, 2 for 's, that leaves 298 I guess? Not sure but it's a strange length to be truncating to.

Code causing the issue:

dataframe.to_sql('tblImport', eng, schema='dbo', if_exists='append', index=False, dtype=dict_SQL_dtypes)

Attempted solutions:

Setting TEXTSIZE in SQL
Changing column type to VARCHAR / NVARCHAR, changing length to 1000 or MAX
Setting datatypes (dtype) to sqlalchemy.types.VARCHAR (idea from here)
Before import, sorting dataframe by descending length of ProjectDescription column to put longest value in top row
Dropping tables and recreating the column as VARCHAR(MAX) as the data type in case the original length of the column was somehow still limiting the import
Altering column type through SQL query directly

Versions:

OS: Windows 10
python: 3.6
pandas: 0.24.1
sqlalchemy: 1.2.17
pyodbc: 4.0.25
DB management tool: MSSMS
SQL server 12

DB connection:

pyodbc.connect("DRIVER={{ODBC Driver 13 for SQL Server}};SERVER={0};DATABASE={1};UID={2};PWD={3}".format(serv, db, usr, pwd))

Engine:

create_engine('mssql+pyodbc://{}@sqlserver:{}@{}:1433/{}?driver=ODBC+Driver+13+for+SQL+Server'.format(usr, pwd, serv, db), echo=True)

ProjectDescription column character_maximum_length from DB schema: -1

Columns:

Column Name          | Data Type      | Allow Nulls
ProjectUID           | varchar(60)    | Unchecked
Framework            | char(5)        | Checked
Partner              | char(3)        | Checked
SCUID                | char(9)        | Checked
ClientName           | varchar(255)   | Checked
PartnerProjectNumber | varchar(40)    | Checked
ProjectName          | varchar(255)   | Checked
ProjectDescription   | varchar(MAX)   | Checked

Traceback:

  File "AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\sqlalchemy\engine\base.py", line 1216, in _execute_context
    cursor, statement, parameters, context
  File "AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\sqlalchemy\engine\default.py", line 533, in do_executemany
    cursor.executemany(statement, parameters)
pyodbc.DataError: ('22001', '[22001] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]String or binary data would be truncated. (8152) (SQLExecDirectW); [22001] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]The statement has been terminated. (3621)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "run_for_tests.py", line 258, in <module>
    outmsg = run()
  File "run_for_tests.py", line 252, in run
    process_result = process_data(file_type, file_name, relative_path, dict_clients)
  File "run_for_tests.py", line 204, in process_data
    update_sql_tables(df_base)
  File "server_interaction.py", line 74, in update_sql_tables
    dataframe.to_sql('tblImport', eng, schema='dbo', if_exists='append', index=False, dtype=dict_SQL_dtypes)
  File "AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\generic.py", line 2532, in to_sql
    dtype=dtype, method=method)
  File "AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\io\sql.py", line 460, in to_sql
    chunksize=chunksize, dtype=dtype, method=method)
  File "AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\io\sql.py", line 1174, in to_sql
    table.insert(chunksize, method=method)
  File "AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\io\sql.py", line 686, in insert
    exec_insert(conn, keys, chunk_iter)
  File "AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\io\sql.py", line 599, in _execute_insert
    conn.execute(self.table.insert(), data)
  File "AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\sqlalchemy\engine\base.py", line 980, in execute
    return meth(self, multiparams, params)
  File "AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\sqlalchemy\sql\elements.py", line 273, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\sqlalchemy\engine\base.py", line 1099, in _execute_clauseelement
    distilled_params,
  File "AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\sqlalchemy\engine\base.py", line 1240, in _execute_context
    e, statement, parameters, cursor, context
  File "AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\sqlalchemy\engine\base.py", line 1458, in _handle_dbapi_exception
    util.raise_from_cause(sqlalchemy_exception, exc_info)
  File "AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\sqlalchemy\util\compat.py", line 296, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\sqlalchemy\util\compat.py", line 276, in reraise
    raise value.with_traceback(tb)
  File "AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\sqlalchemy\engine\base.py", line 1216, in _execute_context
    cursor, statement, parameters, context
  File "AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\sqlalchemy\engine\default.py", line 533, in do_executemany
    cursor.executemany(statement, parameters)
  sqlalchemy.exc.DataError: (pyodbc.DataError) ('22001', '[22001] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]String or binary data would be truncated. (8152) (SQLExecDirectW); [22001] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]The statement has been terminated. (3621)') [SQL: 'INSERT INTO dbo.[tblImport] ([ProjectUID], [Framework], [Partner], [SCUID], [ClientName], [PartnerProjectNumber], [ProjectName], [ProjectDescription]) VALUES (?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: (('xxx', 'xxxxx', 'xxx', 'xxxxxxx', 'ABC', 123, 'ABC', 'Provide competition facilities. Build this facility with a capacity of 9000 spectators  ... (9 characters truncated) ... m pool and diving facilities and athletes changing facilities to a competion standard and then, post games, convert it to a council operated facility'), ....)] (Background on this error at: http://sqlalche.me/e/9h9h)

asked Feb 14, 2019 at 17:12

EllaP

638 bronze badges

I am running on the same issue. Have you solved this issue somehow?

Sam Al-Ghammari
– Sam Al-Ghammari

2019-04-11 14:48:05 +00:00
Commented Apr 11, 2019 at 14:48
1

@debuggingXD Unfortunately nothing yet. I ended up nuking the database and rebuilding it with the column set to varchar (max) from the start

EllaP
– EllaP

2019-04-13 21:06:04 +00:00
Commented Apr 13, 2019 at 21:06

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Data truncation when using pandas to_sql into VARCHAR(MAX) SQL column

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked