Python Pandas write to sql with NaN values

Question

I'm trying to read a few hundred tables from ascii and then write them to mySQL. It seems easy to do with Pandas but I hit an error that doesn't make sense to me:

I have a data frame of 8 columns. Here is the column list/index:

metricDF.columns

Index([u'FID', u'TYPE', u'CO', u'CITY', u'LINENO', u'SUBLINE', u'VALUE_010', u'VALUE2_015'], dtype=object)

I then use to_sql to append the data up to mySQL

metricDF.to_sql(con=con, name=seqFile, if_exists='append', flavor='mysql')

I get a strange error about a column being "nan":

OperationalError: (1054, "Unknown column 'nan' in 'field list'")

As you can see all my columns have names. I realize mysql/sql support for writing appears in development so perhaps that's the reason? If so is there a work around? Any suggestions would be greatly appreciated.

joris · Accepted Answer · 2014-09-13 22:17:36Z

38

Update: starting with pandas 0.15, to_sql supports writing NaN values (they will be written as NULL in the database), so the workaround described below should not be needed anymore (see https://github.com/pydata/pandas/pull/8208).
Pandas 0.15 will be released in coming October, and the feature is merged in the development version.

This is probably due to NaN values in your table, and this is a known shortcoming at the moment that the pandas sql functions don't handle NaNs well (https://github.com/pydata/pandas/issues/2754, https://github.com/pydata/pandas/issues/4199)

As a workaround at this moment (for pandas versions 0.14.1 and lower), you can manually convert the nan values to None with:

df2 = df.astype(object).where(pd.notnull(df), None)

and then write the dataframe to sql. This however converts all columns to object dtype. Because of this, you have to create the database table based on the original dataframe. Eg if your first row does not contain NaNs:

df[:1].to_sql('table_name', con)
df2[1:].to_sql('table_name', con, if_exists='append')

edited Sep 13, 2014 at 22:17

answered Apr 29, 2014 at 8:00

joris

140k37 gold badges257 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user3221876 Over a year ago

Awesome! Totally worked. You gotta love a simple one line solution like this. Thanks.

aensm Over a year ago

Note that this workaround will not remove NaT values from datetime64 columns (at least not when I have tried)

joris Over a year ago

@aensm Thanks for noting, this bug will also be resolved in 0.15.

Amine Kerkeni · Accepted Answer · 2014-08-02 21:59:57Z

3

using the previous solution will change column dtype from float64 to object_.

I have found a better solution, just add the following _write_mysql function:

from pandas.io import sql

def _write_mysql(frame, table, names, cur):
    bracketed_names = ['`' + column + '`' for column in names]
    col_names = ','.join(bracketed_names)
    wildcards = ','.join([r'%s'] * len(names))
    insert_query = "INSERT INTO %s (%s) VALUES (%s)" % (
        table, col_names, wildcards)

    data = [[None if type(y) == float and np.isnan(y) else y for y in x] for x in frame.values]

    cur.executemany(insert_query, data)

And then override its implementation in pandas as below:

sql._write_mysql = _write_mysql

With this code, nan values will be saved correctly in the database without altering the column type.

edited Aug 2, 2014 at 21:59

answered Aug 2, 2014 at 14:32

Amine Kerkeni

9243 gold badges14 silver badges29 bronze badges

4 Comments

joris Over a year ago

Note that this will not work with with pandas 0.14 and above (there was a refactor in pandas 0.14)

Amine Kerkeni Over a year ago

I have just discovered that. I am trying to find a similar solution for pandas 0.14

joris Over a year ago

You can eg do the check in maybe_asscalar (github.com/pydata/pandas/blob/master/pandas/io/sql.py#L580)

Dobedani Over a year ago

I'm using python 3.6 with pandas version 1.1.5, sqlalchemy version 1.3.13 and cx_Oracle version 8.1.0. In my code I'm using Pandas method to_sql to insert the data from a dataframe into an existing Oracle table. I'm using the dtype argument to indicate what data types the various columns have. Sometimes the source data miss certain columns and I then insert those columns in the right place and I assign np.nan as values. Those columns are supposed to contain integers. The insert fails with this error message: expecting number. I also tried None as values but to no avail. No solution for Oracle?

Matheus Galasso · Accepted Answer · 2025-06-05 16:17:56Z

0

I had a simillar situation, while converting some Postgres Databases into MySQL.
I tried other solutions, until I found your post.
The approach that worked clean and softly, was inserting this line, after saving the rows on a DataFrame variable:


data = data.astype(object).where(pandas.notnull(data), None)

answered Jun 5 at 16:17

Matheus Galasso

1

Collectives™ on Stack Overflow

Python Pandas write to sql with NaN values

3 Answers 3

3 Comments

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related