Pandas to Sql Server speed - python bulk insert?

Question

This is probably a highly discussed topic, but i have not found "the answer" yet. I am inserting big tables into Azure SQL Server monthly. I process the raw data in memory with python and Pandas. I really like the speed and versatility of Pandas.

Sample DataFrame size = 5.2 million rows, 50 columns, 250 MB memory allocated

Transferring the processed Pandas DataFrame to Azure SQL Server is always the bottleneck. For data transfer, I used to_sql (with sqlalchemy). I tried fast_executemany, various chunk sizes etc arguments.

The fastest way I found so far is to export the DataFrame to a csv file, then BULK INSERT that into SQL server using either SSMS, bcp, Azure Blob etc.

However, i am looking into doing this bypassing the csv file creation, since my df has all the dtypes set, already loaded in memory.

What is your fastest means of transfer this df to SQL Server, utilizing solely python/Pandas? I am also interested in solutions like using binary file transfer etc. - as long as I eliminate flat file export/import.

Thanks

Eli · Accepted Answer · 2021-08-04 17:04:07Z

4

I had a similar issue, and I resolved it using a BCP utility. The basic description of the bottleneck issue is that it seems to be using RBAR data entry, as in Row-By-Agonizing-Row inserts, i.e. one insert statement/record. Going the bulk insert route has saved me a lot of time. The real benefit seemed to come once I crossed the threshold of 1M+ records, which you seem to well ahead of.

Link to utility:https://github.com/yehoshuadimarsky/bcpandas

answered Aug 4, 2021 at 17:04

Eli

2,6182 gold badges29 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

dbidder Over a year ago

5204251 rows copied. Network packet size (bytes): 4096 Clock Time (ms.) Total : 341391 Average : (15244.25 rows per sec.) Printout and ease of use are added benefits!

Collectives™ on Stack Overflow

Pandas to Sql Server speed - python bulk insert?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related