1

I have a set of records that I need to insert into a Sql Server Database using pymssql. These records have the probability scores from a predictive model along with a bunch of other columns. For testing purpose, I have created an excel file with the values I want to insert into MS Sql Server Table.

Some of the records in these columns have null values which are causing issues when I try to insert them.

Below is my code

import pymssql
import pandas as pd

conn = pymssql.connect(server='server name', user='uid', password='pwd', 
database='dbname')
cursor=conn.cursor()

#Read Scores from Excel
df = pd.read_excel("Sample_Score_Records.xlsx")

for index,row in df.iterrows():
    cursor.execute("INSERT INTO dbo.ANLY_SCORE([id],[scoredatetime],[score1],[score2],[model_name],[score_id],[updated_revenue_ind]) 
values(%s,%d,%d,%d,%s,%s,%s)", (row['id'],row['scoredatetime'],row['score1'],row['score2'],row['model_name'],row['score_id'],row['updated_revenue_ind']))
    conn.commit()
conn.close()

Code works fine when I remove the columns with null values entirely from the insert statement but return the below error when I include them.

Error:

ProgrammingError: (207, "Invalid column name 'nan'.DB-Lib error message 20018, severity 16:\nGeneral SQL Server error: Check messages from the SQL Server\n")

Below is how my data looks

ID  scoredatetime   Score1  Score2  Model Name  Score_id    updated_revenue_ind
0015CF3EE2  10/17/2018 11:22:04 AM  0.2564859   0.365465    Model1  sana87y47164    y
0015CF3EE3  10/17/2018 11:22:04 AM  0.215485    0.458962    Model1  sana87y47165    n
0015CF3EE4  10/17/2018 11:22:04 AM  0.3458963   0.145893    Model1  sana87y47166    NULL
0015CF3EE5  10/17/2018 11:22:04 AM  0.265489    0.987513    Model1  sana87y47167    y
0015CF3EE6  10/17/2018 11:22:04 AM  0.324896    0.154787    Model1  sana87y47168    y
0015CF3EE7  10/17/2018 11:22:04 AM  0.386492    0.482172    Model1  sana87y47169    y
0015CF3EE8  10/17/2018 11:22:04 AM  0.2489714   0.897562    Model1  sana87y47170    y
0015CF3EE9  10/17/2018 11:22:04 AM  0.402365    0.147851    Model1  sana87y47171    n
0015CF3EE10 10/17/2018 11:22:04 AM  0.380236    0.278968    Model1  sana87y47172    NULL
0015CF3EE11 10/17/2018 11:22:04 AM  0.361245    0.478938    Model1  sana87y47173    NULL

Any help is greatly appreciated!

2
  • looks as though there's a bug in pymssql, not sure if it shouldn't be allowing column names to be referred to like that… Commented Oct 17, 2018 at 20:01
  • 1
    maybe have a look at: stackoverflow.com/questions/41722498/… instead of writing your own code to do this Commented Oct 17, 2018 at 20:03

2 Answers 2

2

Invalid column name 'nan'

The problem is that pandas does not store null values as Python None, it stores them as NaN. pymssql appears to be confused by that.

You probably should just install SQLAlchemy and then use df.to_sql to insert the rows:

from sqlalchemy import create_engine
# ...

#Read Scores from Excel
df = pd.read_excel("Sample_Score_Records.xlsx")

# rename one DataFrame column to match table column name
df.rename({'Model Name': 'model_name'}, axis='columns', inplace=True)

# insert rows
engine = create_engine('mssql+pymssql://uid:pwd@servername/dbname')
df.to_sql('ANLY_SCORE', engine, schema='dbo', if_exists='append', index=False)
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! I will give it a try and let you know if that worked.
2

Just replace dataframe NaNs with None object: df.where((pd.notnull(df)), None)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.