2

I have the below class.

class MyObject implements Serializable {
    private String key;
    private String val;
    private int num;

    MyObject(String a, String b, int c) {
        this.key = a;
        this.val = b;
        this.num = c;
    }
}

I need to create a list of Objects, the following method is called repeatedly (say 10K times or more)

public void addToIndex(String a, String b, int c) {
    MyObject ob = new MyObject(a,b,c);
    list.add(ob); // List<MyObject>
}

I used a profiler to see the memory footprint, and it increases so much due to creation of object everytime. Is there a better way of doing this? I am writing the list then to disk.

EDIT: This is how I write once the list is fully populated. Is there a way to append once the memory goes beyond a value (size of list).

ObjectOutputStream oos = new ObjectOutputStream(
                        new DeflaterOutputStream(new FileOutputStream(
                                list)));
                oos.writeObject(list);
                oos.close();
14
  • 2
    Using arrays would be faster if you knew from the start how many objects do you need. Commented Oct 5, 2013 at 16:11
  • 1
    Do you need to keep all the elements in list always? If not then you can move some data to disk in the form of file/db Commented Oct 5, 2013 at 16:12
  • No, I wont be able to know that. Commented Oct 5, 2013 at 16:12
  • @Naren For a number of instantiations that large, you will need a caching mechanism for the program to be performant... your intention is quite unclear to me, don't you have an alternative to this low-performance solution? Commented Oct 5, 2013 at 16:16
  • 3
    What is "so much"? What is the length of the strings? 10000 objects is nothing nowadays. Assuming 10 KB per object (that's long strings already), this would only take 100 MB for 10,000 objects. Do you have OutOfMemoryErrors? If not, then why do you care? That said, if the goal is to write to disk, why don't you write to disk directly instead of storing everything in memory? Commented Oct 5, 2013 at 16:18

2 Answers 2

5

I used a profiler to see the memory footprint, and it increases so much due to creation of object everytime. Is there a better way of doing this?

Java Serialization doesn't use that much memory in your situation. What it does so is create a lot of garbage, far more than you might imagine. It also has a very verbose output which can be improved using compression as you do.

A simple way to improve this situation is to use Externalizable instead of Serializable. This can reduce the garbage produced dramatically and make it more compact. It can also be much faster with lower over head.

BTW You can get even better performance if you use custom serialization for the list itself.

public class Main {
    public static void main(String[] args) throws IOException, ClassNotFoundException {
        List<MyObject> list = new ArrayList<>();
        for (int i = 0; i < 10000; i++) {
            list.add(new MyObject("key-" + i, "value-" + i, i));
        }

        for (int i = 0; i < 10; i++) {
            timeJavaSerialization(list);
            timeCustomSerialization(list);
            timeCustomSerialization2(list);
        }
    }

    private static void timeJavaSerialization(List<MyObject> list) throws IOException, ClassNotFoundException {
        File file = File.createTempFile("java-serialization", "dz");
        long start = System.nanoTime();
        ObjectOutputStream oos = new ObjectOutputStream(
                new DeflaterOutputStream(new FileOutputStream(file)));
        oos.writeObject(list);
        oos.close();
        ObjectInputStream ois = new ObjectInputStream(
                new InflaterInputStream(new FileInputStream(file)));
        Object o = ois.readObject();
        ois.close();
        long time = System.nanoTime() - start;
        long size = file.length();
        System.out.printf("Java serialization uses %,d bytes and too %.3f seconds.%n",
                size, time / 1e9);
    }

    private static void timeCustomSerialization(List<MyObject> list) throws IOException {
        File file = File.createTempFile("custom-serialization", "dz");
        long start = System.nanoTime();
        MyObject.writeList(file, list);
        Object o = MyObject.readList(file);
        long time = System.nanoTime() - start;
        long size = file.length();
        System.out.printf("Faster Custom serialization uses %,d bytes and too %.3f seconds.%n",
                size, time / 1e9);
    }

    private static void timeCustomSerialization2(List<MyObject> list) throws IOException {
        File file = File.createTempFile("custom2-serialization", "dz");
        long start = System.nanoTime();
        {
            DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(
                    new DeflaterOutputStream(new FileOutputStream(file))));
            dos.writeInt(list.size());
            for (MyObject mo : list) {
                dos.writeUTF(mo.key);
            }
            for (MyObject mo : list) {
                dos.writeUTF(mo.val);
            }
            for (MyObject mo : list) {
                dos.writeInt(mo.num);
            }
            dos.close();
        }
        {
            DataInputStream dis = new DataInputStream(new BufferedInputStream(
                    new InflaterInputStream(new FileInputStream(file))));
            int len = dis.readInt();
            String[] keys = new String[len];
            String[] vals = new String[len];
            List<MyObject> list2 = new ArrayList<>(len);
            for (int i = 0; i < len; i++) {
                keys[i] = dis.readUTF();
            }
            for (int i = 0; i < len; i++) {
                vals[i] = dis.readUTF();
            }
            for (int i = 0; i < len; i++) {
                list2.add(new MyObject(keys[i], vals[i], dis.readInt()));
            }
            dis.close();
        }
        long time = System.nanoTime() - start;
        long size = file.length();
        System.out.printf("Compact Custom serialization uses %,d bytes and too %.3f seconds.%n",
                size, time / 1e9);
    }


    static class MyObject implements Serializable {
        private String key;
        private String val;
        private int num;

        MyObject(String a, String b, int c) {
            this.key = a;
            this.val = b;
            this.num = c;
        }

        MyObject(DataInput in) throws IOException {
            key = in.readUTF();
            val = in.readUTF();
            num = in.readInt();
        }

        public void writeTo(DataOutput out) throws IOException {
            out.writeUTF(key);
            out.writeUTF(val);
            out.writeInt(num);
        }

        public static void writeList(File file, List<MyObject> list) throws IOException {
            DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(
                    new DeflaterOutputStream(new FileOutputStream(file))));
            dos.writeInt(list.size());
            for (MyObject mo : list) {
                mo.writeTo(dos);
            }
            dos.close();
        }

        public static List<MyObject> readList(File file) throws IOException {
            DataInputStream dis = new DataInputStream(new BufferedInputStream(
                    new InflaterInputStream(new FileInputStream(file))));
            int len = dis.readInt();
            List<MyObject> list = new ArrayList<>(len);
            for (int i = 0; i < len; i++) {
                list.add(new MyObject(dis));
            }
            dis.close();
            return list;
        }
    }
}

prints finally

Java serialization uses 61,168 bytes and too 0.061 seconds.
Faster Custom serialization uses 62,519 bytes and too 0.024 seconds.
Compact Custom serialization uses 68,225 bytes and too 0.020 seconds.

As you can see my attempts to make the file more compact instead made it faster, which is a good example of why you should test performance improvements.

Sign up to request clarification or add additional context in comments.

2 Comments

I executed with that code. This is the Memory report from previous run commondatastorage.googleapis.com/naren%2FMem_2.html and this is from the replaced code run commondatastorage.googleapis.com/naren%2FMem_1.html
You can see that most of the objects created in the replacement is the Strings in the key and val which you can't avoid without a significant structural change.
0

Consider using fast-serialization. It is source-level compatible to JDK-serialization, and creates less bloat. Additionally it beats most of handcrafted "Externalizable" serialization, as its not only the JDK-serialization implementation itself, but also inefficient In/Output stream implementations of stock JDK which hurt performance.

http://code.google.com/p/fast-serialization/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.