5

I have a big json file, about ~40Gb in size. When I try to convert this file of array of objects to a list of java objects, it crashes. I've used all sizes of maximum heap xmx but nothing has worked!

public Set<Interlocutor> readJsonInterlocutorsToPersist() {
    String userHome = System.getProperty(USER_HOME);
    log.debug("Read file interlocutors "+userHome);
    try {
        ObjectMapper mapper = new ObjectMapper();
        // JSON file to Java object
        Set<Interlocutor> interlocutorDeEntities = mapper.readValue(
                new File(userHome + INTERLOCUTORS_TO_PERSIST),
                new TypeReference<Set<Interlocutor>>() {
                });
        return interlocutorDeEntities;
    } catch (Exception e) {
        log.error("Exception while Reading InterlocutorsToPersist file.",
                e.getMessage());
        return null;
    }
} 

Is there a way to read this file using BufferedReader and then to push object by object?

5
  • If it's 40GB of JSON, I doubt that the whole dataset will fit in your memory. Even once deserialized into a Set of Objects. Commented Jul 1, 2020 at 10:14
  • may write your self parser with JsonParser.nextToken() Commented Jul 1, 2020 at 10:14
  • 1
    baeldung.com/jackson-streaming-api Commented Jul 1, 2020 at 10:27
  • Streaming APIs are available like: sites.google.com/site/gson/streaming these will dispose string data (json) as soon as they are decompiled into Java objects. Commented Jul 1, 2020 at 10:30
  • Why do you need it as a list of objects? Commented Jul 1, 2020 at 10:47

2 Answers 2

4

You should definitly have a look at the Jackson Streaming API (https://www.baeldung.com/jackson-streaming-api). I used it myself for GB large JSON files. The great thing is you can divide your JSON into several smaller JSON objects and then parse them with mapper.readTree(parser). That way you can combine the convenience of normal Jackson with the speed and scalability of the Streaming API.

Related to your problem:

I understood that your have a really large array (which is the reason for the file size) and some much more readable objects:

e.g.:

[ // 40GB
{}, // Only 400 MB
{},
]

What you can do now is to parse the file with Jackson's Streaming API and go through the array. But each individual object can be parsed as "regular" Jackson object and then processed easily.

You may have a look at this Use Jackson To Stream Parse an Array of Json Objects which actually matches your problem pretty well.

Sign up to request clarification or add additional context in comments.

2 Comments

Your solution works also, but my object have many dependencies ( object inside others ) that's why i need one way to read and convert to object. thanks
Well with this solution you could also read all objects into a set. It is actually the same solution as you've found, but instead of using Gson it would use Jackson.
2

is there a way to read this file using BufferedReader and then to push object by object ?

Of course, not. Even you can open this file how you can store 40GB as java objects in memory? I think you don't have such amount of memory in you computers (but technically using ObjectMapper you should have about 2 times more operation memory - 40GB for store json + 40GB for store results as java objects = 80 GB).

I think you should use any way from this questions, but store information in databases or files instead of memory. For example, if you have millions rows in json, you should parse and save every rows to database without keeping it all in memory. And then you can get this data from database step by step (for example, not more then 1GB for every time).

1 Comment

Well it is possible in theory, as SAX (for XML) proves. Of course you can't have the entire document in memory at once, but you could read parts of the structure, write them into a database / into smaller documents for individual objects, drop them from memory, and repeat. I do not know any implementation that does this, though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.