2

I use a hashmap structure to store a mapping between a parent document(Key) and list of all associated sub documents(Value) with that parent document. This needs to be iterated later on to process the mapping. The key as well as each value in the List is usually 75-100 characters long file name. This works fine but there are 50000 parent documents each with 50-100 sub documents associated with it. So this creates a huge object load in memory. Is there any better way to store this mapping in a memory efficient way?

Map<String, List<String>> docmap=new HashMap<String,List<String>>();
6
  • You need to iterate a populated map ? or you have empty map and you want to populate it ? Commented Nov 8, 2015 at 15:27
  • I need to know if there is a better way to do this mapping other than HashMap Commented Nov 8, 2015 at 15:30
  • 2
    It sounds like the HashMap is not the issue, it is the size of your data. Any data structure that stores all of that distinct data will take up a lot of space, unless you have a way to compress it. Assuming you can't compress it, the obvious solution is to store it in a database. Commented Nov 8, 2015 at 15:33
  • 1
    Probably there is redundancy in the names to store. So compression could be applied to them. Or something like a prefix tree (trie). Or delegate this data structure to a database (which manages what is in main memory and which on hard disk). Commented Nov 8, 2015 at 16:06
  • 1
    Strings in Java copy the bytes they use to ensure they're immutable. You probably want to store a subclass of CharSequence that is just a view of the string, not a copy, in your HashMap. Commented Nov 8, 2015 at 16:10

2 Answers 2

1

This should be more memory-efficient since it doesn't waste memory for buckets management:

        String [][] array = new String [50000][]; // parents
        array[1] = new String[100]; // children for parent 1
Sign up to request clarification or add additional context in comments.

Comments

1

You structure is not inefficient and there is no point in looking for anything better.

I calculate that the space used by just your strings is likely to be around 650MB, ignoring the overhead of the hashmaps and lists.

I don't know the overhead of ArrayList or HashMap, but it is not going to be that much in comparison to the size of the sublists. Even each list and hash entry cost you 20 bytes each, that would only be 2MB - a drop in the ocean.

So your problem is not the hash maps or the lists - it is your raw data.

If 650MB is too much to store in memory (it is not that much these days), then your only option is to store it in a database.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.