3

I wrote a custom index to a custom table which uses 500MB of heap for 500k strings. Only 10% of the strings are unique; the rest are repeats. Every string is of length 4.

How i can optimize my code? Should I use another collection? I tried to implement a custom string pool to save memory:

public class StringPool {

    private static WeakHashMap<String, String> map = new WeakHashMap<>();

    public static String getString(String str) { 
        if (map.containsKey(str)) {
            return map.get(str);
        } else {
            map.put(str, str);
            return map.get(str);
        }
    }
}

private void buildIndex() {
        if (monitorModel.getMessageIndex() == null) {
            // the index, every columns create an index
            ArrayList<HashMap<String, TreeSet<Integer>>> messageIndex = new ArrayList<>(filterableColumn.length);
            for (int i = filterableColumn.length; i >= 0; i--) {
                // key -> string,   value -> treeset, the row wich contains the key
                HashMap<String, TreeSet<Integer>> hash = new HashMap<>();
                messageIndex.add(hash);
            }
            // create index for every column
            for (int i = monitorModel.getParser().getMyMessages().getMessages().size() - 1; i >= 0; --i) {
                TreeSet<Integer> tempList;

                for (int j = 0; j < filterableColumn.length; j++) {
                    String value  = StringPool.getString(getValueAt(i, j).toString());
                    if (!messageIndex.get(j).containsKey(value)) {
                        tempList = new TreeSet<>();
                        messageIndex.get(j).put(value, tempList);
                    } else {
                        tempList = messageIndex.get(j).get(value);
                    }

                    tempList.add(i);
                }
            }
            monitorModel.setMessageIndex(messageIndex);
        }
    }
4
  • 1
    500,000 4 character strings is only a few dozen megabytes of memory with no caching at all. Think you're looking in the wrong place. Commented Aug 15, 2012 at 17:30
  • I agree with Affe, that should not exceed a couple of MB, even assuming 50 Bytes per 4 letter String (which is pessimistic) would only get you to 25MB. Commented Aug 15, 2012 at 17:38
  • 1
    ArrayList<HashMap<String, TreeSet<Integer>>> -- Wow, that a structure! :) You impose huge overhead using such data structure. It very well can be a reason of the high memory consumption, not Strings themselves. I have written a blog post some time ago about Java Collection overhead: plumbr.eu/blog/fat-collections Commented Aug 16, 2012 at 7:33
  • Thanks, this kind of answer i search, i will read your blog. Commented Aug 16, 2012 at 10:46

2 Answers 2

5

No need to come up with a custom pool. Just use String.intern().

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks your answer, i try this but it's didn't work, reduce the heap only -2MB.
This would save you the trouble of creating your own string pool. I do believe @AaronD answer is more on point though, a lot of your memory usage may be coming from the number of nested data structures being instantiated.
4

You might want to examine your memory heap in a profiler. My guess is that the memory consumption isn't primarily in the String storage, but in the many TreeSet<Integer> instances. If so, you could optimize considerably by using primitive arrays (int[], short[], or byte[], depending on the actual size of the integer values you're storing). Or you could look into a primitive collection type, such as those provided by FastUtil or Trove.

If you do find that the String storage is problematic, I'll assume that you want to scale your application beyond 500k Strings, or that especially tight memory constraints require you to deduplicate even short Strings.

As Dev said, String.intern() will deduplicate Strings for you. One caveat, however - in the Oracle and OpenJDK virtual machines, String.intern() will store those Strings in the VM permanent-generation, such that they will not be garbage-collected in the future. That's appropriate (and helpful) if:

  1. The Strings you're storing do not change throughout the life of the VM (e.g., if you read in a static list at startup and use it throughout the life of your application).
  2. The Strings you need to store fit comfortably in the VM permanent generation (with adequate room for classloading and other consumers of PermGen). Update: see below.

If either of those conditions is false, you are probably correct to build a custom pool. But my recommendation is that you consider a simple HashMap in place of the WeakHashMap you're currently using. You probably don't want these values to be garbage-collected while they're in your cache, and WeakHashMap adds another level of indirection (and the associated object pointers), increasing memory consumption further.

Update: I'm told that JDK 7 stores interned Strings (String.intern()) in the main heap, not in perm-gen, as earlier JDKs did. That makes String.intern() less risky if you're using JDK 7.

1 Comment

Thanks your answer Aaron, i try to work with primitive collection, i try String.intern(), but my heap reduces more memory.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.