1

I try to parse large json file (more 600Mo) with Java. My json file look like that:

{
    "0" : {"link_id": "2381317", "overview": "mjklmklmklmklmk", "founded": "2015", "followers": "42", "type": "Gamer", "website": "http://www.google.com",  "name": "troll", "country": "United Kingdom", "sp": "Management Consulting" },
    "1" : {"link_id": "2381316", "overview": "mjklmklmklmklmk", "founded": "2015", "followers": "41", "type": "Gamer", "website": "http://www.google2.com",  "name": "troll2", "country": "United Kingdom", "sp": "Management Consulting" }
    [....]

    "345240" : {"link_id": "2381314", "overview": "mjklmklmklmklmk", "founded": "2015", "followers": "23", "type": "Gamer", "website": "http://www.google2.com",  "name": "troll2", "country": "United Kingdom", "sp": "Management Consulting" }
}

and my code looks like that:

public class dumpExtractor {

    private static final String filePath = "/home/troll/Documents/analyse/lol.json";

    public static void main(String[] args) {

    try {
        // read the json file
        FileReader reader = new FileReader(filePath);
        JSONParser jsonParser = new JSONParser();
        JSONObject jsonObject = (JSONObject) jsonParser.parse(reader);
        Iterator<JSONObject> iterator = jsonObject.values().iterator();

        while (iterator.hasNext()) {
        JSONObject jsonChildObject = iterator.next();
        System.out.println("==========================");
        String name = (String) jsonChildObject.get("name");
        System.out.println("Industry name: " + name);

        String type = (String) jsonChildObject.get("type");
        if (type != null && !type.isEmpty()) {
            System.out.println("type: " + type);
        }

        String sp = (String) jsonChildObject.get("sp");
        if (sp != null && !sp.isEmpty()) {
            System.out.println("sp: " + sp);
        }
        System.out.println("==========================");
        }
        System.out.println("done ! ");
    } catch (IOException ex) {
        ex.printStackTrace();
    } 
    }
}

I 've got this error:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.HashMap.createEntry(HashMap.java:897)
    at java.util.HashMap.addEntry(HashMap.java:884)
    at java.util.HashMap.put(HashMap.java:505)
    at org.json.simple.parser.JSONParser.parse(Unknown Source)
    at org.json.simple.parser.JSONParser.parse(Unknown Source)

How I can fix that ?

Thanks in advance.

5
  • The problem is the complete object is too huge. Try reading that file line by line and parsing each of the nested objects separately. Commented Sep 11, 2015 at 14:43
  • Can you add the lines with your import statements so we can see what Parser you're using? Commented Sep 11, 2015 at 14:45
  • 1
    @CarlosBribiescas The stack trace shows it: org.json.simple.parser.JSONParser Commented Sep 11, 2015 at 14:46
  • docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/… Commented Sep 11, 2015 at 14:49
  • How much RAM do you allow the JVM to allocate? See stackoverflow.com/questions/14763079/… Commented Sep 11, 2015 at 14:49

3 Answers 3

3

If you have to read huge JSON Files you can't mantain in memory all informations. Extending memory can be a solution for a file of 1 Gb. If the files tomorrow is a 2 Gb Files?

The right approach to this problem is to parse the json element by element using a streaming parser. Basically instead of loading the whole json in memory and creating a whole big object representing it you need to read single elements of the json and converting them to objects step by step.

Here you find a nice article explaing how to do it with jackson library.

Sign up to request clarification or add additional context in comments.

Comments

2

You have two choices:

  1. Give more memory to the Java program by specifying the -Xmx argument, e.g. -Xmx1g to give it 1 Gb of memory.
  2. Use a "streaming" JSON parser. This will scale to infinitely large JSON files.

json-simple has a streaming API. See https://code.google.com/p/json-simple/wiki/DecodingExamples#Example_5_-_Stoppable_SAX-like_content_handler

There are other libraries with good streaming parser, e.g. Jackson.

Comments

1

Increase the JVM heap space by setting the environment variables :

SET _JAVA_OPTIONS = -Xms512m -Xmx1024m

But it cant be a permanent solution as your file can be increased in future

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.