4

Parse allows users to download their data using their Export tool, but only allows the data to be exported in JSON format. I want this in CSV format for analysis in Excel.

While a simple script suffices for smaller JSON objects, I am dealing with a dataset that is 670,000 rows and over 360MB. Online converters cannot handle this file size, frequently citing that PHP has exceeded its memory limit.

I have tried PHP CLI-based scripts and online converters, but they all seem to exceed their allocated memory. I figured I needed a new approach when ini_set('memory_limit', '4096M'); still didn't give me enough memory.

I am currently using this CLI-based script for parsing data:

// flatten to CSV
function flatten2CSV($file){
    $fileIO = fopen($file, 'w+');
    foreach ($this->dataArray as $items) {
        $flatData = array();
        $fields = new RecursiveIteratorIterator(new RecursiveArrayIterator($items));
        foreach($fields as $value) {
            array_push($flatData, $value);
        }
        fputcsv($fileIO, $flatData, ";", '"');
    }
    fclose($fileIO);
}

// and $this->dataArray is created here
function readJSON($JSONdata){
    $this->dataArray = json_decode($JSONdata,1);
    $this->prependColumnNames();
    return $this->dataArray;
}

private function prependColumnNames(){
    foreach(array_keys($this->dataArray[0]) as $key){
        $keys[0][$key] = $key;
    }
    $this->dataArray = array_merge($keys, $this->dataArray);
}

How can I solve memory management issues with PHP and parsing through this large dataset? Is there a better way to read in JSON objects than json_decode for large datasets?

4
  • Is there a need to process all of the files in one batch? If it is possible to export the JSON in batches and then do individual conversions you may not need as much active memory. do you have code you are using? - Also is PHP a requirement for this use case? Commented Apr 17, 2015 at 21:34
  • One could export the data from Parse in smaller datasets, but this becomes time consuming for regularly pulling data from the database. I've added in the JSON2CSV class that I am using to parse the JSON. Commented Apr 17, 2015 at 21:37
  • I am not familiar with Parse. Could you write a script to pull down the smaller file sets? Also, if you are running this in a web page to do a conversion, it doesn't really make sense if you don't need the input on screen. It makes more sense to do this as a batch job on the command line, scheduled, etc. Can you clarify why you are trying to do this in the browser. Commented Apr 17, 2015 at 21:41
  • So Parse is a popular backend database and API combination used for mobile app development. I am not trying to do this in the browser exclusively - I am open to any method to change JSON into CSV. Writing a script would incur I/O, which is problematic as Parse throttles API calls. Commented Apr 17, 2015 at 21:44

3 Answers 3

1

If you're able to run a script in the browser, check out the PapaParse JavaScript library -- it supports chunking and multi-threading for larger datasets and can convert JSON to CSV.

Specific config options that may be relevant:

  • worker
  • chunk
  • fastMode

Alternatively, there is a fork of PapaParse for Node.js, though without the worker and chunk options.

I have no affiliation with this library, but have used it successfully for CSV to JSON conversions on large datasets.

Sign up to request clarification or add additional context in comments.

Comments

1

As it turns out, PHP does not natively support a streaming JSON parser (based on what I found doing some research). However, Salsify wrote an excellent blog post about how they created a streaming JSON parser for PHP.

This is the link to the GitHub code

Using their example.php file, I was able to successfully read in the JSON file to a PHP object.

A few other items that I had to do to make this work:

  • Increase the memory limit for PHP: I changed the memory_limit in php.ini to read as memory_limit=2048M
  • Modify the flatten2CSV() function: My new code needed to include the Parse format for their JSON which is { "results": [ objects ] }. The new function is:

    function flatten2CSV($file, $data){     
        $fileIO = fopen($file, 'w+');
        foreach ($data['results'] as $items) {
            $flatData = array();
            $fields = new RecursiveIteratorIterator(new RecursiveArrayIterator(new RecursiveArrayIterator($items)));
            foreach($fields as $value) {
                array_push($flatData, $value);
            }
            fputcsv($fileIO, $flatData, ";", '"');
        }
        fclose($fileIO);
    }
    
  • Manually add the headers: For the purpose of this exercise, the above code was sufficient for me to parse my file. However, I did have to manually add the header line to my CSV file. I'd suggest writing code to pull out the keys and add these as headers.

YMMV with this function. Because I had to modify the function specifically for the Parse JSON, your JSON might not work in this. My Parse Object isn't too complex, so arrays of Pointers may break this.

Comments

1

You can try to use: https://github.com/jehiah/json2csv

To convert:

{"user": {"name":"jehiah", "password": "root"}, "remote_ip": "127.0.0.1", "dt" : "[20/Aug/2010:01:12:44 -0400]"}
{"user": {"name":"jeroenjanssens", "password": "123"}, "remote_ip": "192.168.0.1", "dt" : "[20/Aug/2010:01:12:44 -0400]"}
{"user": {"name":"unknown", "password": ""}, "remote_ip": "76.216.210.0", "dt" : "[20/Aug/2010:01:12:45 -0400]"}

to:

"jehiah","127.0.0.1"
"jeroenjanssens","192.168.0.1"
"unknown","76.216.210.0"

you would either

json2csv -k user.name,remote_ip -i input.json -o output.csv

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.