This is probably a highly discussed topic, but i have not found "the answer" yet. I am inserting big tables into Azure SQL Server monthly. I process the raw data in memory with python and Pandas. I really like the speed and versatility of Pandas.
Sample DataFrame size = 5.2 million rows, 50 columns, 250 MB memory allocated
Transferring the processed Pandas DataFrame to Azure SQL Server is always the bottleneck. For data transfer, I used to_sql (with sqlalchemy). I tried fast_executemany, various chunk sizes etc arguments.
The fastest way I found so far is to export the DataFrame to a csv file, then BULK INSERT that into SQL server using either SSMS, bcp, Azure Blob etc.
However, i am looking into doing this bypassing the csv file creation, since my df has all the dtypes set, already loaded in memory.
What is your fastest means of transfer this df to SQL Server, utilizing solely python/Pandas? I am also interested in solutions like using binary file transfer etc. - as long as I eliminate flat file export/import.
Thanks