The fundamental difference between JSON and CSV is that JSON is a tree structure[0], and CSV is a table.
Visually, we want to transform something that looks like this:
{"a":{"d":0},
"b":{"d":1,"e":2},
"c":{"e":3}}
into something that looks like this:
+-+-+-+
| |d|e|
+-+-+-+
|a|0| |
+-+-+-+
|b|1|2|
+-+-+-+
|c| |3|
+-+-+-+
Note that the second object ("b") in the root JSON object has two values, but the first and third objects don't. JSON data may not have the same number/names of fields in each sub-object. That means we need to add empty values to the first and third JSON objects, so that when they are written to the CSV, they have the same number of columns, in the same order as the largest JSON object.
In the example above, both the "a" and "c" objects need a missing "d" or "e" value to have the same number of columns as the "b" object.
We'll need two passes. One to find the largest object, and a second to add the empty values to the smaller objects before printing out each CSV row.
json_decode[1] takes a second boolean argument, $associative, which when true decodes json objects into arrays. We'll use that to take advantage of PHP's built-in array functions.
To demonstrate with the JSON above:
$json = '{"a":{"d":0},"b":{"d":1,"e":2},"c":{"e":3}}';
$json_arr = json_decode($json, true);
var_dump($json_arr);
output:
array(3) {
'a' =>
array(1) {
'd' =>
int(0)
}
'b' =>
array(2) {
'd' =>
int(1)
'e' =>
int(2)
}
'c' =>
array(1) {
'e' =>
int(3)
}
}
I've come up with this function to find the largest subarray in a multidimensional array:
$longest_count = 0;
$longest_k = '';
function find_largest_subarray($arr, $last_count=0, $last_k=NULL){
global $longest_count, $longest_k;
foreach($arr as $k => $v){
if(is_array($v)){
if(count($v)+$last_count > $longest_count){
$longest_count = $last_count+count($v);
$longest_k = $last_k ?? $k;
}
find_largest_subarray($v, count($v)+$last_count, $k);
}
}
}
It uses 2 global variables to keep track of the longest subarray found so far, and its key.
We can use that on our $json_arr from above:
find_largest_subarray($json_arr);
echo "longest count = $longest_count \n";
echo "longest k = $longest_k \n";
output:
longest count = 2
longest k = b
So that's our first pass. We have found the largest object. We still need to add the missing values. To know which values are missing, we need to know which keys are present in the largest object, and then we'll know which keys are missing in the smaller objects. Because we now know the largest subarray, we can use this[1] SO answer to get all the keys of this (potentially) multidimensional array:
function array_keys_multi(array $array){
$keys = array();
foreach ($array as $key => $value) {
$keys[] = $key;
if (is_array($value)) {
$keys = array_merge($keys, array_keys_multi($value));
}
}
return $keys;
}
$header_columns = array_keys_multi($json_arr[$longest_k]);
var_dump($header_columns);
output:
array(2) {
[0] =>
string(1) "d"
[1] =>
string(1) "e"
}
We now have almost everything we need. We still need a way to flatten the (potentially) multidimensional array. We need to flatten the array in case the JSON data has a sub-subarray like this:
$json = '{"a":{"d":0},"b":{"d":1,"e":2, "f":{"g":4}},"c":{"e":3}}'; // added "f" object
This function flattens the array[3]:
function flatten($arr){
$return = [];
$it = new RecursiveIteratorIterator( new RecursiveArrayIterator($arr));
foreach ($it as $key=>$val){
$return[$key] = $val;
}
return $return;
}
Now we have everything we need. We'll use an outer loop through the JSON array (converted with json_decode), and an inner loop through the $header_columns array to fill in the blanks if a row is missing a key.
$csv_out = (implode(',', $header_columns)) . "\r\n"; //header for CSV
foreach($json_arr as $k => $v){
$v = flatten($v);
foreach($header_columns as $col){
$csv_out .= '"'. ($v[$col] ?? '') . '",'; //double-quote and add comma
}
$csv_out = rtrim($csv_out, ','); //remove trailing comma
$csv_out .= "\r\n";
}
echo $csv_out;
output:
d,e
"0",""
"1","2"
"","3"
That's our CSV! Here is the code altogether, with all variables and functions declared up top, intermediate output statements removed, and the OP's (cleaned) JSON data:
<?php
$json = '[{"type":"NON_ATTRIBUTED","conversion":{"value_1":"000000100355321","value_3":"XXXX","value_4":"12667","value_5":"6"},"stream_type":"COOKIE"},'.
'{"type":"ATTRIBUTED","conversion":{"value_1":"000000167865321","value_3":"YYYY","value_4":"12668","value_5":"0"},"stream_type":"COOKIE"},'.
'{"type":"NON_ATTRIBUTED","conversion":{"value_1":"000000134535321","value_3":"AAAA","value_4":"12669","value_5":"9"},"stream_type":"COOKIE"},'.
'{"type":"NON_ATTRIBUTED","conversion":{"value_1":"000000100357651","value_3":"WWWW","value_4":"12670","value_5":"2"},"stream_type":"COOKIE"}]';
$json_arr = json_decode($json, true);
$longest_count = 0;
$longest_k = '';
$header_columns = '';
$csv_out = '';
function find_largest_subarray($arr, $last_count=0, $last_k=NULL){
global $longest_count, $longest_k;
foreach($arr as $k => $v){
if(is_array($v)){
if(count($v)+$last_count > $longest_count){
$longest_count = $last_count+count($v);
$longest_k = $last_k ?? $k;
}
find_largest_subarray($v, count($v)+$last_count, $k);
}
}
}
function flatten($arr){
$return = [];
$it = new RecursiveIteratorIterator( new RecursiveArrayIterator($arr));
foreach ($it as $key=>$val){
$return[$key] = $val;
}
return $return;
}
function array_keys_multi(array $array){
$keys = array();
foreach ($array as $key => $value) {
$keys[] = $key;
if (is_array($value)) {
$keys = array_merge($keys, array_keys_multi($value));
}
}
return $keys;
}
find_largest_subarray($json_arr);
$header_columns .= array_keys_multi($json_arr[$longest_k]);
$csv_out .= (implode(',', $header_columns)) . "\r\n"; //header for CSV
foreach($json_arr as $k => $v){
$v = flatten($v);
foreach($header_columns as $col){
$csv_out .= '"'. ($v[$col] ?? '') . '",'; //double-quote and add comma
}
$csv_out = rtrim($csv_out, ','); //remove trailing comma
$csv_out .= "\r\n";
}
echo $csv_out;
output:
type,conversion,value_1,value_3,value_4,value_5,stream_type
"NON_ATTRIBUTED","","000000100355321","XXXX","12667","6","COOKIE"
"ATTRIBUTED","","000000167865321","YYYY","12668","0","COOKIE"
"NON_ATTRIBUTED","","000000134535321","AAAA","12669","9","COOKIE"
"NON_ATTRIBUTED","","000000100357651","WWWW","12670","2","COOKIE"
Note that the "conversion" key is included, but has an empty value for each row. This is because the "conversion" value is an array which is flattened into the other keys. One nice thing about this approach is that if some "conversion" values are arrays, and some are integers, this still works.
A warning about this approach -- if the JSON data has different objects with fields that are not a subset of fields of the largest object, those fields won't be included in the CSV.
For example:
$json = '{"a":{"f":0},"b":{"d":1,"e":2},"c":{"e":3}}';
In that case the $column_headers array won't include the field "f", so the "a" row will be empty.
References:
[0] https://en.wikipedia.org/wiki/Tree_structure
[1] https://www.php.net/manual/en/function.json-decode.php
[2] How to get all the key in multi-dimensional array in php
[3] How to Flatten a Multidimensional Array?