0

I have a big list of over 20000 items to be fetched from DB and process it daily in a simple console based Java App.

What is the best way to do that. Should I fetch the list in small sets and process it or should I fetch the complete list into an array and process it. Keeping in an array means huge memory requirement.

Note: There is only one column to process.

Processing means, I have to pass that string in column to somewhere else as a SOAP request. 20000 items are string of length 15.

3 Answers 3

1

It depends. 20000 is not really a big number. If you are only processing 20000 short strings or numbers, the memory requirement isn't that large. But if it's 20000 images that is a bit larger.

There's always a tradeoff. Multiple chunks of data means multiple trips to the database. But a single trip means more memory. Which is more important to you? Also can your data be chunked? Or do you need for example record 1 to be able to process record 1000.

These are all things to consider. Hopefully they help you come to what design is best for you.

Sign up to request clarification or add additional context in comments.

4 Comments

20000 string of length 15each. multiple trips is not an issue. data can be chunked as all are independent.
At 16bit (2 bytes) * 15 chars per string * 20000 strings that's only about 600kb
@AkhilKNambiar data size in your case is not not big enough for you to sweat about . Just jam it in appropriate data structure e.g ArrayList. I would rather avoid multiple trips in your case.
@JeffStorey - its a bit more than that ... since each String comprises 2 Java objects complete with headers and private fields. Still, it should fit into a default sized heap with no problems.
0

Correct me If I am Wrong , fetch it little by little , and also provide a rollback operation for it .

Comments

0

If the job can be done on a database level i would fo it using SQL sripts, should this be impossible i can recommend you to load small pieces of your data having two columns like the ID-column and the column which needs to be processed.

This will enable you a better performance during the process and if you have any crashes you will not loose all processed data, but in a crash case you eill need to know which datasets are processed and which not, this can be done using a 3rd column or by saving the last processed Id each round.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.