uploading records of list of files in parallel using python to DB

Question

I have a list of files each file have mass of records separting by \n , i need to proccess those records in parallel and upload them to some sql server could someone provide an idea what is the best way to do this with python

mmmmmm · Accepted Answer · 2010-09-27 16:04:20Z

1

The best way might not be to upload in parallell but use SQL Servers bulk importing mechanisims
e.g.
BULK INSERT
bcp

EDIT:

If you need to process them then a way I have often used is
1) bulk load the data into a staging table
2) Process the data on the database
3) Insert into main tables

Stages 2 and 3 can be combined if the processing is of a reasonable type.

This could be faster as there are less round trips to the server and processing a set of data rather than row by row is usually quicker.

Also I thing that SQL server will make use of more than one CPU in doing this processing so you get your processing parallel for free

edited Sep 27, 2010 at 16:04

answered Sep 27, 2010 at 10:19

mmmmmm

32.8k28 gold badges92 silver badges124 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

AKM Over a year ago

I need to proccess every record from the file before and then to upload it and not to bulk insert the whole file as a table

Spike Gronim · Accepted Answer · 2010-09-27 15:08:53Z

0

I would use a Pool. I have provided an example. For optimum throughput you will want to batch your inserts to the database. A simple way to do this is to process all of your records in python, then use the BULK INSERT tool from Mark's comment to do the inserts. It will be slower if you insert one at a time, since your program has to wait for the network round trip to the SQL server.

from multiprocessing import Pool
import sys

def worker(record):
    print "Processing... %s" % (record)


pool = Pool(processes=8)
for record in sys.stdin:
    pool.apply_async(worker, [record])

pool.close()
pool.join()

answered Sep 27, 2010 at 15:08

Spike Gronim

6,20225 silver badges22 bronze badges

Collectives™ on Stack Overflow

uploading records of list of files in parallel using python to DB

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related