16

im trying to decode large json file 222mb file.

i understand i can not use json_decode directly by using file_get_contents() to read whole file and decode whole string, as it would consume alot of memory and would return nothing(this is what its doing so far.)

so i went to try out libraries, The one i tried recently is JSONParser. what it does reads the objects one by one in json array.

but due to lack of documentation there, i want to ask here if anyone has worked with this library.

this is the example test code from github

// initialise the parser object
$parser = new JSONParser();

// sets the callbacks
$parser->setArrayHandlers('arrayStart', 'arrayEnd');
$parser->setObjectHandlers('objStart', 'objEnd');
$parser->setPropertyHandler('property');
$parser->setScalarHandler('scalar');
/*
echo "Parsing top level object document...\n";
// parse the document
$parser->parseDocument(__DIR__ . '/data.json');*/

$parser->initialise();

//echo "Parsing top level array document...\n";
// parse the top level array

$parser->parseDocument(__DIR__ . '/array.json');

how to use a loop and save the object in php variable that we can easily decode to php array for our further use.

this would take some time as it would be doing this one by one for all objects of json array, but question stands how to loop over it using this library, or isn't there such option.

Or are any other better options or libraries for this sorta job?

1
  • You should not have to use a loop for what you want. The parser will just emit events which your callbacks should handle and do whatever they want / need to do with the data. Alternatively there is github.com/salsify/jsonstreamingparser. I cannot vouch for either library though so you will have to check it out yourself. Commented Jun 17, 2016 at 19:31

4 Answers 4

17

Another alternative is to use halaxa/json-machine.

Usage in case of iteration over JSON is the same as in case of json_decode, but it will not hit memory limit no matter how big your file is. No need to implement anything, just your foreach.

Example:

$users = \JsonMachine\JsonMachine::fromFile('500MB-users.json');

foreach ($users as $id => $user) {
    // process $user as usual
}

See github readme for more details.

Sign up to request clarification or add additional context in comments.

2 Comments

Is it possible to easily modify sections of a huge JSON file using halaxa/json-machine? E.g. using JSON Pointer syntax? Thank you!
@tonix You cannot directly modify the file you are reading, but you can start iteration of file A and write sections one by one into a new file B. You can of course modify any section before writing it into the new file. That way you can accomplish what you want. The only difference is that desired modifications will end up in a new file.
8

One alternative here is to use the salsify/jsonstreamingparser

You need to create your own Listener.

$testfile = '/path/to/file.json';
$listener = new MyListener();
$stream = fopen($testfile, 'r');
try {
    $parser = new \JsonStreamingParser\Parser($stream, $listener);
    $parser->parse();
    fclose($stream);
} catch (Exception $e) {
    fclose($stream);
    throw $e;
}

To make things simply to understand, I"m using this json for example:

JSON Input

{
    "objects": [
    {
        "propertyInt": 1,
        "propertyString": "string",
        "propertyObject": { "key": "value" }            
    },
    {
        "propertyInt": 2,
        "propertyString": "string2",
        "propertyObject": { "key": "value2" }
    }]
}

You need to implement your own listener. In this case, I just want to get the objects inside array.

PHP

class MyListener extends \JsonStreamingParser\Listener\InMemoryListener
{
    //control variable that allow us to know if is a child or parent object
    protected $level = 0;

    protected function startComplexValue($type)
    {
        //start complex value, increment our level
        $this->level++;
        parent::startComplexValue($type);
    }
    protected function endComplexValue()
    {
        //end complex value, decrement our level
        $this->level--;
        $obj = array_pop($this->stack);
        // If the value stack is now empty, we're done parsing the document, so we can
        // move the result into place so that getJson() can return it. Otherwise, we
        // associate the value
        if (empty($this->stack)) {
            $this->result = $obj['value'];
        } else {
            if($obj['type'] == 'object') {
                //insert value to top object, author listener way
                $this->insertValue($obj['value']);
                //HERE I call the custom function to do what I want
                $this->insertObj($obj);
            }
        }
    }

    //custom function to do whatever
    protected function insertObj($obj)
    {
        //parent object
        if($this->level <= 2) {
          echo "<pre>";
          var_dump($obj);
          echo "</pre>";
        }
    }
}

Output

array(2) {
  ["type"]=>
  string(6) "object"
  ["value"]=>
  array(3) {
    ["propertyInt"]=>
    int(1)
    ["propertyString"]=>
    string(6) "string"
    ["propertyObject"]=>
    array(1) {
      ["key"]=>
      string(5) "value"
    }
  }
}
array(2) {
  ["type"]=>
  string(6) "object"
  ["value"]=>
  array(3) {
    ["propertyInt"]=>
    int(2)
    ["propertyString"]=>
    string(7) "string2"
    ["propertyObject"]=>
    array(1) {
      ["key"]=>
      string(6) "value2"
    }
  }
}

I tested it against a JSON file with 166MB and it works. Maybe you need to adapt the listener to your needs.

3 Comments

Use "Guzzle, PHP HTTP client" to download the json from url without loading it to memory.
A question, why is "$this->insertValue($obj['value']);" required. It only increases the memory usage? Can you strip it if you don't need the full output?
@Felipe Duarte can Salsify's json streaming parser be used for a large array object instead of a file e.g. like an array from a result set from a database?
1

Not a reply intended for the op but as an alternative method to anyone else looking into this topic...

You CAN use json_decode() on ANY size file with next to no memory use. Yepp, the best of both worlds. I tried several solutions such as jsonmachine and json_decode as they were designed where some methods were fast digesting the entire file at once with a memory crash while others completed but were painfully slow.

My solution is to break apart the json file into smaller sections and process each with json_decode(). I did this by setting the head and the end of the json file to variables (or constants), then concatenating head + body excerpt + end and processing each batch separately, where the body excerpt was 200-400 records but can be anything the system can handle. I am sure some people will have something negative to say about this but in essence it would be the same as manually making many small json files and processing them individually. This method simply does it for you, relatively fast and can handle a file of literally any size.

My sample file had 1,177,437 records (3.8GB) that involved several operations to prepare the data such as many coordinate conversions, string manipulations, sql queries to retrieve additional data to be included and gz_deflate(). It created sql statements that were queried and completed in 37 min with no errors averaging 530 sql records created per second. The table ended up being 5.2 GB when said and done. If you know that the file(s) will be formatted 100% correctly this can be sped up by reading an entire line opposed to 1 character at a time. I opted for 1 character at a time because on occasion I get geojson files with no line breaks and I designed it for maximum compatibility first, speed second.

Tips: I found that preg_match() worked well to extract the head of the file while simply looking for an equal quantity of opening and closing curly brackets within a string indicated a complete record. The end of the file was a simple "\n]\n}\n" that I hard coded because it is common to all files.

Comments

-13

You still need to use json_decode and file_get_contents to get full JSON (you can't parse partial JSON). Just increase memory limit for PHP to bigger value using ini_set('memory_limit', '500M');

Also you will be processing longer so use set_time_limit(0);

4 Comments

Not sure where that upvote came from, but this answer is simply not true.
i already have used ini_set('memory_limit', '-1'); but will use time limit that you have suggested. But question that i asked is that how to loop?? i know i have to use json_decode. also i do know i know i cant use that on full file. it can only used in chunks etc.. and yes your answer is not true. we can parse partial json. there are libraries doing that.
@PeeHaa - why is this answer not true? I'm reading it here and I'd really, really like to know why. Should I telepathically read your mind or should I blindly believe your statement without evidence or counter-argument?
the question states at the beginning that the string is too large to fit in memory.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.