2

I'm developing an application which loads lots of data (like from csv).

I'm creating List<List<SimpleCell>> and loading into it the readed cells. SimpleCell class contains 5 * String, every String have on average 10 characters.

So I'm thinking that if I read 1000 rows - each containing 160 columns - that gives 1000*160=160 000 SimpleCell's instances - it'll be something about 160 000 * sizeof(SimpleCell.class) =~ 160 000 * 10 * 5 = 8 000 000 bytes =~ 7.63 MB.

But when I'm looking at jconsole (and after clicking Perform GC) memory usage is something about 790MB. How could this be?

Note that I don't store any references to any "temporary" objects. Here is the code when the memory usage rises:

        for(int i = r.getFromIndex(); i <= r.getToIndex(); ++i) {
            System.out.println("Processing: 'ZZ " + i + "'");
            List<SimpleCell> values = saxRead("ZT/ZZ " + i + "");
            rows.add(values);
        }

saxRead just creates inputStream parses it with SAX, closes stream, and returns cells (created by SAXHandler) - so there are only local variables (that I think will be garbaged in the near 'future').

I'm getting out of heap error when reading 1000 rows but I must read approximately 7k.

Obviously - there's something that I don't know about jvm memory. So why memory usage is so huge when loading this relatively small amount of data?

6
  • If you make a System.out.println ("Values size " + values.size() ) are you having a decent number of SimpleCells instances? Commented Sep 19, 2012 at 19:17
  • Is there some reason you can't process your file incrementally? Commented Sep 19, 2012 at 19:19
  • @HernanVelasquez values.size() always returns 160 - there's a constant indicating that in SAXHandler. @Wug I'm processing it incrementally - saxRead reads one file and one row from it. Commented Sep 19, 2012 at 19:23
  • Is it possible for you to post the saxRead method? Commented Sep 19, 2012 at 19:26
  • @Xeon: so ... don't add it to a list, process it and drop it? Commented Sep 19, 2012 at 19:27

4 Answers 4

3

A String uses 48 bytes plus the size of the text * 2. (Each character is 2 bytes) The Simple Cell object uses 40 bytes and the List of them uses 1064 bytes.

This means each row uses 1064 + 160 * 40 + 5 * 180 * (48 + 20) bytes or about 68K. If you have 1000 lines you will be using about 70 MB which is much less than what you are seeing.

I suggest you use a memory profile to see exactly how much memory is being used by what. e.g. VisualVM or YourKit.

Depending on how you construct the Strings you retain even more memory than this. For example its likely you are retaining a reference to the original XML as when you take a substring of it, you are actually holding a copy of the original.


You may find this class useful. It will reduce the amount of memory Strings use if they are using more than they need and reduce duplicates using a fixed size cache.

static class StringCache {
    final WeakReference<String>[] strings;
    final int mask;

    @SuppressWarnings("unchecked")
    StringCache(int size) {
        int size2 = 128;
        while (size2 < size)
            size2 *= 2;
        strings = new WeakReference[size2];
        mask = size2 - 1;
    }

    public String intern(String text) {
        if (text.length() == 0) return "";

        int hash = text.hashCode() & mask;
        WeakReference<String> wrs = strings[hash];
        if (wrs != null) {
            String ret = wrs.get();
            if (text.equals(ret))
                return ret;
        }
        String ret = new String(text);
        strings[hash] = new WeakReference<String>(ret);
        return ret;
    }
}
Sign up to request clarification or add additional context in comments.

5 Comments

That's assuming SimpleCell extends Object. If there are intermediate classes for whatever reason, the weight will be greater. That's also assuming an ArrayList, a LinkedList would be also weight more.
I agree, what the OP has told us is only explains about 10% of the total memory usage.
Correct - this was the substring issue.
The way around this is to use new String(string) which is usually pointless but in this case will ensure the String isn't holding on to a char[] any larger than it needs to be.
You might find the string caching class useful to reduce memory consumption esp if you have duplicate strings.
2

JVM memory management introduce a lot of overhead. For example, on 32bit vm, a String with 5 characters consume 58 bytes of memory (not only 5 !):

JVM overhead: 16b + bookkeeping fields: 12b + pointer to char[]: 4b + char[] jvm overhead: 16b + data:10b

Comments

2

Use VisualVM to profile your heap usage, and be prepared to be surprised.

Comments

1

Java is very memory hungry. Consider these estimates:

32-bit VM:

Size of one of your String (approx)

10 UTF-16 chars = 20 bytes

1 array length = 4 bytes

1 array object header = 8 bytes

1 array reference = 4 bytes

1 offset, count, hashcode (internal fields) = 12 bytes

1 object header = 8 bytes

1 of your typical Java Strings = 20 + 4 + 8 + 4 + 12 + 8 = 56 bytes

Size of a Simple Cell (approx, including Strings)

5 Strings = 56 * 5 = 280 bytes

5 String references = 5 * 4 bytes = 20 bytes

1 object header = 8 bytes

1 SimpleCell = 180 + 20 + 8 = 308 bytes

160000 SimpleCell = 308 * 160000 = 49280000 bytes

64-bit VM (with no compressed oops)

Size of one of your String (approx)

10 UTF-16 chars = 20 bytes

1 array length = 4 bytes

1 array object header = 8 bytes

1 array reference = 8 bytes

1 offset, count, hashcode (internal fields) = 12 bytes

1 object header = 8 bytes

1 of your typical Java Strings = 20 + 4 + 8 + 8 + 12 + 8 = 60 bytes

Size of a Simple Cell (approx, including Strings)

5 Strings = 60 * 5 = 300 bytes

5 String references = 5 * 8 bytes = 40 bytes

1 object header = 8 bytes

1 SimpleCell = 300 + 40 + 8 = 308 bytes

160000 SimpleCell = 348 * 160000 = 55680000 bytes

Obviously very far of your 790 Mb (looks like a leak), but almost an order of magnitude more than what you estimated.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.