1

I have some data to be read from multiple sql server databases (like 200). There will be like 10 tables in each of these databases where I need to read the data from, how can I do this in the best possible way using java?

Thanks in advance

3
  • Sounds like a case for Hadoop? Commented May 30, 2012 at 1:48
  • What is your biggest concern? Speed? Memory? Anything else? Commented May 30, 2012 at 1:49
  • It needs to be comparable speed, this is going to be a nightly import process to be finished in 2-3 hours and there are 4 other applications to import from nightly with a size of 100's of Mbs of data into our application. Commented May 30, 2012 at 1:53

2 Answers 2

1

Concurrency to the rescue.

To achieve the best throughput for your heavy workload, write your application as multithreaded from the start, then you can speed it up or throttle it back, depending on performance constraints.

ExecutorService is a nice way to break down tasks in a scalable way. I would suggest you define each database-import task as a Callable, and then 'invoke' all the tasks from an ExecutorService.

I'd do something like this:

List<YourCallableImportJobs> work= yourFactory.getAllWork();
// this variable can be used to tweak performance. 
// Begin with a low number and then ramp it up if it's too slow.
int nThreads=10;     
ExecutorService service = ExecutorService.newFixedThreadPool(nThreads);
List<Future<T>> futures= service.invokeAll(work);

You can poll the Futures to check when the work is done...

Finally, if you wanted concurrent access to each database (particularly for your destination database), I recommend using a connection pooling mechanism such as C3PO. This means that you don't spend too much time opening and closing connections. (You could even break down each import into individual queries - this is when connection pooling would help as well).

Hope this helps

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Amir for the solution, I will be trying this out.
0

Maintain a queue of database connections, with ipaddresses of those databases, use multithreading to connect to each of the database, now as the work from a database finish, close the connection from that database and remove the connection from queue.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.