In my database ddl (SQL Server) I have fields which have default values:
my_database_field varchar(255) NOT NULL
DEFAULT 'NVT'
However, when I try to insert data into this table using DataFrame.to_sql() I get the following error:
Cannot insert the value NULL into column 'my_database_field', table 'MYDB.dbo.my_table'; column does not allow nulls. INSERT fails.
Here is the code:
with engine.begin() as conn:
dataframe.to_sql(table_name, conn, index=False, if_exists='append')
The column 'my_database_field' has NaN values:
data['my_database_field']
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
..
4154 NaN
4155 NaN
4156 NaN
4157 NaN
4158 NaN
This seems to be because the to_sql method is replacing NaN values with NULLs (probably how it should be)
I want the option to insert NaN values and have the database use the DEFAULT value defined in the schema instead of inserting NULL directly when using the to_sql method.
I've tried replacing all NaNs in the dataframe with empty strings however this suffers from the same problem. It propagates those empty strings to the data base and that's what appears, not the default value specified in the DDL.
The only way I can get it to work is to not use pandas dataframe.to_sql method at all but instead load the data from a csv file and write line by line into the data base using basic sqlalchemy SQL Expressions.
I've opened an issue in the pandas github: https://github.com/pandas-dev/pandas/issues/42408
Edit: As per @Larnu's comment, the key bit of information I was missing was that:
A DEFAULT value is only used when you don't specify the column in the INSERT. If you (or the application) explicitly supplies a NULL value, then a NULL value will be inserted. DEFAULT isn't a "use this value when NULL is supplied" property it's a "use this value when the column is omitted"
The problem now becomes how to elegantly handle inserting a large dataframe where some columns contain some NaNs. I'd like to avoid having to go row by row to insert the data with a check to see if any values in that row are NaN.
Edit 2: I solved this problem by using the ORM as follows:
Base = declarative_base()
class MyModel(Base):
__tablename__ = 'my_table'
my_id = Column(String, primary_key=True)
my_integer_field = Column(Integer)
my_database_field = Column(String, server_default='NVT')
@staticmethod
def from_dict(input_dict: dict):
for k, v in input_dict.items():
logging.debug(f"Key: {k} - Value: {v}")
if type(v) is str:
continue
try:
if np.isnan(v):
input_dict[k] = None
except TypeError as e:
logging.debug(e)
logging.debug(f"Could not test if: {type(v)}:{v} was NaN")
return __class__(**input_dict)
As you can see I have added a custom static method called 'from_dict' which builds my model from a dictionary. For each value I check if it is a np.nan and if it is I set it to None instead. This seems to make the ORM not include that field in the insert method under the hood and therefore let the database use the default value. The np.isnan() method only works for numbers not strings so for those I just catch the exception, log out some info and move on.
For string values it seems I must set the 'server_default' in the model itself. If anyone knows more about what's going on here please feel free to share.
DEFAULTvalue is only used when you don't specify the column in theINSERT. If you (or the application) explicitly supplies aNULLvalue, then aNULLvalue will be inserted.DEFAULTisn't a "use this value whenNULLis supplied" property it's a "use this value when the column is omitted" property..fillna('NVT')NaNvalues with the column default specified in the database.