Dataframe to SQL Server using Execute many from pyodbc

Question

I am trying to load data from dataframe to SQL Server using Pyodbc which inserts row by row and its very slow.

I have tried 2 approaches as found online(medium) and I don't find any improvement in performance.

Trying to run in SQL azure so SQL Alchemy is not an easy connection method. please find the approaches which I followed and is there any other way to improve the performance of bulk Load.

Method 1

 cursor = sql_con.cursor()
cursor.fast_executemany = True
for row_count in range(0, df.shape[0]):
  chunk = df.iloc[row_count:row_count + 1,:].values.tolist()
  tuple_of_tuples = tuple(tuple(x) for x in chunk)
  for index,row in ProductInventory.iterrows():
  cursor.executemany("INSERT INTO table ([x]],[Y]) values (?,?)",tuple_of_tuples)

Method 2

 cursor = sql_con.cursor() 
for row_count in range(0, ProductInventory.shape[0]):
      chunk = ProductInventory.iloc[row_count:row_count + 1,:].values.tolist()
      tuple_of_tuples = tuple(tuple(x) for x in chunk)
  for index,row in ProductInventory.iterrows():
    cursor.executemany(""INSERT INTO table ([x]],[Y]) values (?,?)",tuple_of_tuples

Can anyone tell me why the performance is not improved even by 1%? It still takes the same time

Did you ever try DataFrame.to_sql using if_exists = 'append' argument? — Parfait
– Parfait, Commented Apr 7, 2020 at 15:23

Gord Thompson · Accepted Answer · 2020-04-07 16:04:25Z

1

Trying to run in SQL azure so SQL Alchemy is not an easy connection method.

Perhaps you just need to get over that hurdle first. Then you can use pandas to_sql along with fast_executemany=True. For example

from sqlalchemy import create_engine
#
# ...
#
engine = create_engine(connection_uri, fast_executemany=True)
df.to_sql("table_name", engine, if_exists="append", index=False)

If you have a working pyodbc connection string you can convert it to a SQLAlchemy connection URI like so:

connection_uri = 'mssql+pyodbc:///?odbc_connect=' + urllib.parse.quote_plus(connection_string)

edited Apr 7, 2020 at 16:04

answered Apr 7, 2020 at 15:46

Gord Thompson

125k38 gold badges251 silver badges458 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Sankeertan Samrat Over a year ago

when i use sql alchemy this is ther error which it throws. An attempt to complete a transaction has failed. No corresponding transaction found

Gord Thompson Over a year ago

I am unable to reproduce your issue. If you require further assistance please ask a new question that includes a minimal reproducible example.

Eric Truett · Accepted Answer · 2020-04-07 14:59:45Z

A couple of things

Why are you iterating over ProductInventory twice?
Shouldn't the executemany call happen after you've built up the entire tuple_of_tuples, or a batch of them?
The pyodbc documentation says that "running executemany() with fast_executemany=False is generally not going to be much faster than running multiple execute() commands directly." So you need to set cursor.fast_executemany=True in both examples (see https://github.com/mkleehammer/pyodbc/wiki/Cursor for more details/examples). I'm not sure why it is omitted in example 2.

Here is an example of how you can accomplish what I think you are trying to do. The math.ceil and the conditional expression in end_idx = ... account for the last batch, which may be odd-sized. So, in the example below, you have 10 rows and a batch size of 3, so you end up with 4 batches, the last one only having 1 tuple.

import math

df = ProductInventory
batch_size = 500
num_batches = math.ceil(len(df)/batch_size)

for i in range(num_batches):
    start_idx = i * batch_size
    end_idx = len(df) if i + 1 == num_batches else start_idx + batch_size
    tuple_of_tuples = tuple(tuple(x) for x in df.iloc[start_idx:end_idx, :].values.tolist())       
    cursor.executemany("INSERT INTO table ([x]],[Y]) values (?,?)", tuple_of_tuples)

Example Output:

=== Executing: ===
df = pd.DataFrame({'a': range(1,11), 'b': range(101,111)})

batch_size = 3
num_batches = math.ceil(len(df)/batch_size)

for i in range(num_batches):
    start_idx = i * batch_size
    end_idx = len(df) if i + 1 == num_batches else start_idx + batch_size
    tuple_of_tuples = tuple(tuple(x) for x in df.iloc[start_idx:end_idx, :].values.tolist())
    print(tuple_of_tuples)

=== Output: ===
((1, 101), (2, 102), (3, 103))
((4, 104), (5, 105), (6, 106))
((7, 107), (8, 108), (9, 109))
((10, 110),)

Collectives™ on Stack Overflow

Dataframe to SQL Server using Execute many from pyodbc

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related