2

I am trying to load data from dataframe to SQL Server using Pyodbc which inserts row by row and its very slow.

I have tried 2 approaches as found online(medium) and I don't find any improvement in performance.

Trying to run in SQL azure so SQL Alchemy is not an easy connection method. please find the approaches which I followed and is there any other way to improve the performance of bulk Load.

Method 1

 cursor = sql_con.cursor()
cursor.fast_executemany = True
for row_count in range(0, df.shape[0]):
  chunk = df.iloc[row_count:row_count + 1,:].values.tolist()
  tuple_of_tuples = tuple(tuple(x) for x in chunk)
  for index,row in ProductInventory.iterrows():
  cursor.executemany("INSERT INTO table ([x]],[Y]) values (?,?)",tuple_of_tuples)

Method 2

 cursor = sql_con.cursor() 
for row_count in range(0, ProductInventory.shape[0]):
      chunk = ProductInventory.iloc[row_count:row_count + 1,:].values.tolist()
      tuple_of_tuples = tuple(tuple(x) for x in chunk)
  for index,row in ProductInventory.iterrows():
    cursor.executemany(""INSERT INTO table ([x]],[Y]) values (?,?)",tuple_of_tuples 

Can anyone tell me why the performance is not improved even by 1%? It still takes the same time

1
  • Did you ever try DataFrame.to_sql using if_exists = 'append' argument? Commented Apr 7, 2020 at 15:23

2 Answers 2

1

Trying to run in SQL azure so SQL Alchemy is not an easy connection method.

Perhaps you just need to get over that hurdle first. Then you can use pandas to_sql along with fast_executemany=True. For example

from sqlalchemy import create_engine
#
# ...
#
engine = create_engine(connection_uri, fast_executemany=True)
df.to_sql("table_name", engine, if_exists="append", index=False)

If you have a working pyodbc connection string you can convert it to a SQLAlchemy connection URI like so:

connection_uri = 'mssql+pyodbc:///?odbc_connect=' + urllib.parse.quote_plus(connection_string)
Sign up to request clarification or add additional context in comments.

2 Comments

when i use sql alchemy this is ther error which it throws. An attempt to complete a transaction has failed. No corresponding transaction found
I am unable to reproduce your issue. If you require further assistance please ask a new question that includes a minimal reproducible example.
1

A couple of things

  1. Why are you iterating over ProductInventory twice?

  2. Shouldn't the executemany call happen after you've built up the entire tuple_of_tuples, or a batch of them?

  3. The pyodbc documentation says that "running executemany() with fast_executemany=False is generally not going to be much faster than running multiple execute() commands directly." So you need to set cursor.fast_executemany=True in both examples (see https://github.com/mkleehammer/pyodbc/wiki/Cursor for more details/examples). I'm not sure why it is omitted in example 2.

Here is an example of how you can accomplish what I think you are trying to do. The math.ceil and the conditional expression in end_idx = ... account for the last batch, which may be odd-sized. So, in the example below, you have 10 rows and a batch size of 3, so you end up with 4 batches, the last one only having 1 tuple.

import math

df = ProductInventory
batch_size = 500
num_batches = math.ceil(len(df)/batch_size)

for i in range(num_batches):
    start_idx = i * batch_size
    end_idx = len(df) if i + 1 == num_batches else start_idx + batch_size
    tuple_of_tuples = tuple(tuple(x) for x in df.iloc[start_idx:end_idx, :].values.tolist())       
    cursor.executemany("INSERT INTO table ([x]],[Y]) values (?,?)", tuple_of_tuples)

Example Output:

=== Executing: ===
df = pd.DataFrame({'a': range(1,11), 'b': range(101,111)})

batch_size = 3
num_batches = math.ceil(len(df)/batch_size)

for i in range(num_batches):
    start_idx = i * batch_size
    end_idx = len(df) if i + 1 == num_batches else start_idx + batch_size
    tuple_of_tuples = tuple(tuple(x) for x in df.iloc[start_idx:end_idx, :].values.tolist())
    print(tuple_of_tuples)

=== Output: ===
((1, 101), (2, 102), (3, 103))
((4, 104), (5, 105), (6, 106))
((7, 107), (8, 108), (9, 109))
((10, 110),)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.