Converting 3d JSON data to CSV format

Question

I am trying to convert a json file into csv format using a php script. The code is as follows:

if (empty($argv[1]))
    die("The json file name or URL is missed\n");

$jsonFilename = $argv[1];

$json = file_get_contents($jsonFilename);
$array = json_decode($json, true);
$f = fopen('output.csv', 'w');

$firstLineKeys = false;
foreach ($array as $line)
{
    if (empty($firstLineKeys))
    {
            $firstLineKeys = array_keys($line);
            fputcsv($f, $firstLineKeys);
            $firstLineKeys = array_flip($firstLineKeys);
    }

    fputcsv($f, array_merge($firstLineKeys, $line));
}

This kind of works, but is only returning the outer variables of the JSON file, and am getting a "Array to string conversion" warning

The JSON data looks like this:

{"type":"NON_ATTRIBUTED","conversion":{,"value_1":"000000100355321","value_3":"XXXX","value_4":"12667","value_5":"6"},"stream_type":"COOKIE"}
{"type":"ATTRIBUTED","conversion":{,"value_1":"000000167865321","value_3":"YYYY","value_4":"12668","value_5":"0"},"stream_type":"COOKIE"}
{"type":"NON_ATTRIBUTED","conversion":{,"value_1":"000000134535321","value_3":"AAAA","value_4":"12669","value_5":"9"},"stream_type":"COOKIE"}
{"type":"NON_ATTRIBUTED","conversion":{,"value_1":"000000100357651","value_3":"WWWW","value_4":"12670","value_5":"2"},"stream_type":"COOKIE"}

The output I am getting is : type,conversion,stream_type NON_ATTRIBUTED,Array,COOKIE NON_ATTRIBUTED,Array,COOKIE

The output I am expecting is: type,conversion,value_1,value_3,value_4, value_5 ,stream_type NON_ATTRIBUTED,000000100355321, XXXX, 1267, 6, COOKIE ..

Andrew Surzhynskyi · Accepted Answer · 2016-07-27 10:06:01Z

11

json_decode($json, true); converts JSON objects to associative arrays. So this

{
    "type":"NON_ATTRIBUTED",
    "conversion":{,
        "value_1":"000000100355321",
        "value_3":"XXXX",
        "value_4":"12667",
        "value_5":"6"
    },
    "stream_type":"COOKIE"
}

Become this:

array(3) { 
    ["type"]=> string(14) "NON_ATTRIBUTED" 
    ["conversion"]=> array(4) { 
        ["value_1"]=> string(15) "000000100355321" 
        ["value_3"]=> string(4) "XXXX" 
        ["value_4"]=> string(5) "12667" 
        ["value_5"]=> string(1) "6" 
    } 
    ["stream_type"]=> string(6) "COOKIE" 
}

As you see there is nested arrays. And you trying to insert all elements of array to your text file (csv is just a simple text file) with this line:

fputcsv($f, array_merge($firstLineKeys, $line));

It works nice when element of array is string. But when the element is array we got the Array to string conversion. So you must to use loop or array_merge on a nested array to prevent this.

I can't clearly understand how your csv must look like, but I hope this fix of your code will help you. If not, write a comment below.

if (empty($argv[1])) die("The json file name or URL is missed\n");
$jsonFilename = $argv[1];

$json = file_get_contents($jsonFilename);
$array = json_decode($json, true);
$f = fopen('output.csv', 'w');

$firstLineKeys = false;
foreach ($array as $line)
{
    if (empty($firstLineKeys))
    {
        $firstLineKeys = array_keys($line);
        fputcsv($f, $firstLineKeys);
        $firstLineKeys = array_flip($firstLineKeys);
    }
    $line_array = array($line['type']);
    foreach ($line['conversion'] as $value)
    {
        array_push($line_array,$value);
    }
    array_push($line_array,$line['stream_type']);
    fputcsv($f, $line_array);

}

There is also a mistake in your json - unneeded comma: "conversion":{,

edited Jul 27, 2016 at 10:06

answered Dec 18, 2013 at 20:21

Andrew Surzhynskyi

2,7761 gold badge25 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

pHorseSpec Over a year ago

@Applejack For if(empty($firstLineKeys)) does $firstLineKeys have to be true and thus empty for the statements within the if loop to run?

Andrew Surzhynskyi Over a year ago

@pHorseSpec This lines wasn't related to the problem itself so I didn't change it. If you need to convert some JSON to CSV you should decide the structure of your CSV file first of all because JSON and CSV is a different data structures: JSON is a tree and CSV is a table. Topic started build his own structure but he his a mistake in his code and in JSON file, so all I did is just fixed it.

SARAN Over a year ago

Is there any possible convert multidimensional Tree structure json format into CSV format??

Siddharth Choudhary Over a year ago

Hello, @AndrewSurzhynskyi, urgent help needed regarding the same. Can we chat on something, will be very helpful of you.

Andrew Surzhynskyi Over a year ago

@SiddharthChoudhary If you have a question - go ahead and ask it here so everyone could contribute.

ebcode · Accepted Answer · 2024-08-19 05:51:34Z

The fundamental difference between JSON and CSV is that JSON is a tree structure[0], and CSV is a table.

Visually, we want to transform something that looks like this:

{"a":{"d":0},
 "b":{"d":1,"e":2},
 "c":{"e":3}}

into something that looks like this:

+-+-+-+
| |d|e|
+-+-+-+
|a|0| |
+-+-+-+
|b|1|2|
+-+-+-+
|c| |3|
+-+-+-+

Note that the second object ("b") in the root JSON object has two values, but the first and third objects don't. JSON data may not have the same number/names of fields in each sub-object. That means we need to add empty values to the first and third JSON objects, so that when they are written to the CSV, they have the same number of columns, in the same order as the largest JSON object.

In the example above, both the "a" and "c" objects need a missing "d" or "e" value to have the same number of columns as the "b" object.

We'll need two passes. One to find the largest object, and a second to add the empty values to the smaller objects before printing out each CSV row.

json_decode[1] takes a second boolean argument, $associative, which when true decodes json objects into arrays. We'll use that to take advantage of PHP's built-in array functions.

To demonstrate with the JSON above:

$json = '{"a":{"d":0},"b":{"d":1,"e":2},"c":{"e":3}}';
$json_arr = json_decode($json, true);
var_dump($json_arr);

output:

array(3) {
  'a' =>
  array(1) {
    'd' =>
    int(0)
  }
  'b' =>
  array(2) {
    'd' =>
    int(1)
    'e' =>
    int(2)
  }
  'c' =>
  array(1) {
    'e' =>
    int(3)
  }
}

I've come up with this function to find the largest subarray in a multidimensional array:

$longest_count = 0;
$longest_k = '';

function find_largest_subarray($arr, $last_count=0, $last_k=NULL){
  global $longest_count, $longest_k;
  foreach($arr as $k => $v){
    if(is_array($v)){
      if(count($v)+$last_count > $longest_count){
        $longest_count = $last_count+count($v);
        $longest_k = $last_k ?? $k;
      }
      find_largest_subarray($v, count($v)+$last_count, $k);
    }
  }
}

It uses 2 global variables to keep track of the longest subarray found so far, and its key.

We can use that on our $json_arr from above:

find_largest_subarray($json_arr);
echo "longest count = $longest_count \n";
echo "longest k = $longest_k \n";

output:

longest count = 2 
longest k = b

So that's our first pass. We have found the largest object. We still need to add the missing values. To know which values are missing, we need to know which keys are present in the largest object, and then we'll know which keys are missing in the smaller objects. Because we now know the largest subarray, we can use this[1] SO answer to get all the keys of this (potentially) multidimensional array:

function array_keys_multi(array $array){
  $keys = array();
    foreach ($array as $key => $value) {
      $keys[] = $key;
      if (is_array($value)) {
        $keys = array_merge($keys, array_keys_multi($value));
      }
  }
  return $keys;
}

$header_columns = array_keys_multi($json_arr[$longest_k]);
var_dump($header_columns);

output:

array(2) {
  [0] =>
  string(1) "d"
  [1] =>
  string(1) "e"
}

We now have almost everything we need. We still need a way to flatten the (potentially) multidimensional array. We need to flatten the array in case the JSON data has a sub-subarray like this:

$json = '{"a":{"d":0},"b":{"d":1,"e":2, "f":{"g":4}},"c":{"e":3}}'; // added "f" object

This function flattens the array[3]:

function flatten($arr){
  $return = [];
  $it = new RecursiveIteratorIterator( new RecursiveArrayIterator($arr));
  foreach ($it as $key=>$val){
    $return[$key] = $val; 
  }
  return $return;
}

Now we have everything we need. We'll use an outer loop through the JSON array (converted with json_decode), and an inner loop through the $header_columns array to fill in the blanks if a row is missing a key.

$csv_out = (implode(',', $header_columns)) . "\r\n"; //header for CSV

foreach($json_arr as $k => $v){
  $v = flatten($v);
  foreach($header_columns as $col){
    $csv_out .= '"'. ($v[$col] ?? '') . '",'; //double-quote and add comma
  }
  $csv_out = rtrim($csv_out, ','); //remove trailing comma
  $csv_out .= "\r\n";
}

echo $csv_out;

output:

d,e
"0",""
"1","2"
"","3"

That's our CSV! Here is the code altogether, with all variables and functions declared up top, intermediate output statements removed, and the OP's (cleaned) JSON data:

<?php
$json = '[{"type":"NON_ATTRIBUTED","conversion":{"value_1":"000000100355321","value_3":"XXXX","value_4":"12667","value_5":"6"},"stream_type":"COOKIE"},'.
'{"type":"ATTRIBUTED","conversion":{"value_1":"000000167865321","value_3":"YYYY","value_4":"12668","value_5":"0"},"stream_type":"COOKIE"},'.
'{"type":"NON_ATTRIBUTED","conversion":{"value_1":"000000134535321","value_3":"AAAA","value_4":"12669","value_5":"9"},"stream_type":"COOKIE"},'.
'{"type":"NON_ATTRIBUTED","conversion":{"value_1":"000000100357651","value_3":"WWWW","value_4":"12670","value_5":"2"},"stream_type":"COOKIE"}]';
$json_arr = json_decode($json, true);
$longest_count = 0;
$longest_k = '';
$header_columns = '';
$csv_out = '';

function find_largest_subarray($arr, $last_count=0, $last_k=NULL){
  global $longest_count, $longest_k;
  foreach($arr as $k => $v){
    if(is_array($v)){
      if(count($v)+$last_count > $longest_count){
        $longest_count = $last_count+count($v);
        $longest_k = $last_k ?? $k;
      }
      find_largest_subarray($v, count($v)+$last_count, $k);
    }
  }
}

function flatten($arr){
    $return = [];
    $it = new RecursiveIteratorIterator( new RecursiveArrayIterator($arr));
    foreach ($it as $key=>$val){
        $return[$key] = $val; 
    }
    return $return;
}

function array_keys_multi(array $array){
  $keys = array();
    foreach ($array as $key => $value) {
      $keys[] = $key;
      if (is_array($value)) {
        $keys = array_merge($keys, array_keys_multi($value));
      }
  }
  return $keys;
}

find_largest_subarray($json_arr);

$header_columns .= array_keys_multi($json_arr[$longest_k]);

$csv_out .= (implode(',', $header_columns)) . "\r\n"; //header for CSV
    
foreach($json_arr as $k => $v){
  $v = flatten($v);
  foreach($header_columns as $col){
    $csv_out .= '"'. ($v[$col] ?? '') . '",'; //double-quote and add comma
  }
  $csv_out = rtrim($csv_out, ','); //remove trailing comma
  $csv_out .= "\r\n";
}

echo $csv_out;

output:

type,conversion,value_1,value_3,value_4,value_5,stream_type
"NON_ATTRIBUTED","","000000100355321","XXXX","12667","6","COOKIE"
"ATTRIBUTED","","000000167865321","YYYY","12668","0","COOKIE"
"NON_ATTRIBUTED","","000000134535321","AAAA","12669","9","COOKIE"
"NON_ATTRIBUTED","","000000100357651","WWWW","12670","2","COOKIE"

Note that the "conversion" key is included, but has an empty value for each row. This is because the "conversion" value is an array which is flattened into the other keys. One nice thing about this approach is that if some "conversion" values are arrays, and some are integers, this still works.

A warning about this approach -- if the JSON data has different objects with fields that are not a subset of fields of the largest object, those fields won't be included in the CSV.

For example:

$json = '{"a":{"f":0},"b":{"d":1,"e":2},"c":{"e":3}}';

In that case the $column_headers array won't include the field "f", so the "a" row will be empty.

References:

[0] https://en.wikipedia.org/wiki/Tree_structure

[1] https://www.php.net/manual/en/function.json-decode.php

[2] How to get all the key in multi-dimensional array in php

[3] How to Flatten a Multidimensional Array?

Collectives™ on Stack Overflow

Converting 3d JSON data to CSV format

2 Answers 2

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related