1

We have large data in csv file. It has 2.5 million rows and each row has 10 fields and we are trying to prepare hashmaps for each row and then adding that hashmap to arraylist.

I am not able to do this because of huge data its throwing out of memory Java Heap space error.

But my application needs list of hashmap (I don’t want increasing heapspace).

reader = new CSVReader(new FileReader(dataFile),',');
         Map<String, String> feedMap = null;
         String[] firstLine;
         String[] nextLine;
         String mappingKey = null;
         String mappingValue = null;
         //Read one line at a time
         firstLine = reader.readNext();
         while ((nextLine = reader.readNext()) != null){
             int i = 0;
             feedMap = new HashMap<String, String>();
             for(String token : nextLine){
                 mappingKey = xmlNodeMap.get(firstLine[i]);                     
                 if (mappingKey != null) {
                     mappingValue = token.trim().length() > 0 ? token : Constants.NO_VALUE;
                     feedMap.put(mappingKey, mappingValue);
                }
                i++;
        }                
       listOfMaps.add(feedMap);
 }
1
  • 4
    Well, to keep lots of data in memory you need lots of memory. So it's either process the data record by record or keep it all in memory and increase the heap. No free lunch there either. Commented Dec 29, 2014 at 7:19

3 Answers 3

2

This may sound glib, but your problem is that your application needs a List of 2.5 million rows as HashMaps.

This is an absurd, unreasonable and frankly ridiculous requirement; I can't imagine what use such a data structure would be good for.

Change the application to not require it.

Sign up to request clarification or add additional context in comments.

1 Comment

Well, finnaly we splited files in to 1,00,000 rows each based on the total file size, Then we added in to map for every single file and process in to db using list of files.
0

You can try use byte[] instead of String object: byte[] key = mappingKey.getBytes("UTF-8")

Each String object contains set of UTF-16 chars. It means 2-bytes per symbol in most case. UTF-8 encoding uses one byte for ASCII, two bytes for many europe languages.

Also each String object contains reference to char array. It means that you have two objects in memory heap: String and char array. Each object (even just new Object()) costs ~24 bytes (it depends from version Java VM and options).

So you can easy reduce count of objects by factor two (one byte[] instead of pair String + char[]), and array length of UTF-8 symbols usually less than length of UTF-16 chars.

Comments

0

Totally agree with Bohemian answer.

To help you, I suggest that, instead of reading once the file and keeping every thing in memory, you read it once, maintain an "index map" (depending on you needs). Then when you have to do research on you file you will have to open a stream again and use your "index map" to optimize the time spent in search.

The above solution will heavily rely on file access so take a look at java.nio for efficient access.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.