0

I have a pandas dataframe with 27 columns and ~45k rows that I need to insert into a SQL Server table.

I am currently using with the below code and it takes 90 mins to insert:

conn = pyodbc.connect('Driver={ODBC Driver 17 for SQL Server};\
                   Server=@servername;\
                   Database=dbtest;\
                   Trusted_Connection=yes;')
cursor = conn.cursor()  #Create cursor



 for index, row in t6.iterrows():

    cursor.execute("insert into dbtest.dbo.test( col1, col2, col3, col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,,col27)\
                                                        values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)",
                                                        row['col1'],row['col2'], row['col3'],,row['col27'])

I have also tried to load using executemany and that takes even longer to complete, at nearly 120mins.

I am really looking for a faster load time since I need to run this daily.

2

3 Answers 3

2

You can set fast_executemany in pyodbc itself for versions>=4.0.19. It is off by default.

import pyodbc

server_name = 'localhost'
database_name = 'AdventureWorks2019'
table_name = 'MyTable'
driver = 'ODBC Driver 17 for SQL Server'

connection = pyodbc.connect(driver='{'+driver+'}', server=server_name, database=database_name, trusted_connection='yes') 

cursor = connection.cursor()

cursor.fast_executemany = True   # reduce number of calls to server on inserts

# form SQL statement
columns = ", ".join(df.columns)

values = '('+', '.join(['?']*len(df.columns))+')'
      
statement = "INSERT INTO "+table_name+" ("+columns+") VALUES "+values

# extract values from DataFrame into list of tuples
insert = [tuple(x) for x in df.values]

cursor.executemany(statement, insert)

Or if you prefer sqlalchemy and dataframes directly.

import sqlalchemy as db

engine = db.create_engine('mssql+pyodbc://@'+server_name+'/'+database_name+'?trusted_connection=yes&driver='+driver, fast_executemany=True)

df.to_sql(table_name, engine, if_exists='append', index=False)

See fast_executemany in this link.

https://github.com/mkleehammer/pyodbc/wiki/Features-beyond-the-DB-API

Sign up to request clarification or add additional context in comments.

Comments

1

I have worked through this in the past, and this was the fastest that I could get it to work using sqlalchemy.

import sqlalchemy as sa
engine = (sa.create_engine(f'mssql://@{server}/{database}
          ?trusted_connection=yes&driver={driver_name}', fast_executemany=True)) #windows authentication
df.to_sql('Daily_Report', con=engine, if_exists='append', index=False)

If the engine is not working for you, then you may have a different setup so please see: https://docs.sqlalchemy.org/en/13/core/engines.html

You should be able to create the variables needed above, but here is how I get the driver:

driver_name = ''
driver_names = [x for x in pyodbc.drivers() if x.endswith(' for SQL Server')]
if driver_names:
    driver_name = driver_names[-1] #You may need to change the [-1] if wrong driver to [-2] or a different option in the driver_names list.
if driver_name:
    conn_str = f'''DRIVER={driver_name};SERVER='''
else:
    print('(No suitable driver found. Cannot connect.)')

3 Comments

The above throws out an error 'TypeError: to_sql() got an unexpected keyword argument 'uri''. I tried creating the engine in a separate variable but that doesn't work either..
@user10 I'm sorry, I assumed dask and pandas had the same parameters for the to_sql function. I'll update my answer with the pandas way.
So, I have a daily report that I use dask with on 20 million rows and it takes about an hour, using alll of the same methods but passing con for pandas instead of uri for dask. I think to_sql should have similar performance on dask and pandas, so hoefully the above code now works for you and it is quick given a small dataframe.
0

You can try to use the method 'multi' built in pandas to_sql.

df.to_sql('table_name', con=engine, if_exists='replace', index=False, method='multi')

The multi method allows you to 'Pass multiple values in a single INSERT clause.' per documentation. I found it to be pretty efficient.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.