17

I am trying to convert a json file into csv format using a php script. The code is as follows:

if (empty($argv[1]))
    die("The json file name or URL is missed\n");

$jsonFilename = $argv[1];

$json = file_get_contents($jsonFilename);
$array = json_decode($json, true);
$f = fopen('output.csv', 'w');

$firstLineKeys = false;
foreach ($array as $line)
{
    if (empty($firstLineKeys))
    {
            $firstLineKeys = array_keys($line);
            fputcsv($f, $firstLineKeys);
            $firstLineKeys = array_flip($firstLineKeys);
    }

    fputcsv($f, array_merge($firstLineKeys, $line));
}

This kind of works, but is only returning the outer variables of the JSON file, and am getting a "Array to string conversion" warning

The JSON data looks like this:

{"type":"NON_ATTRIBUTED","conversion":{,"value_1":"000000100355321","value_3":"XXXX","value_4":"12667","value_5":"6"},"stream_type":"COOKIE"}
{"type":"ATTRIBUTED","conversion":{,"value_1":"000000167865321","value_3":"YYYY","value_4":"12668","value_5":"0"},"stream_type":"COOKIE"}
{"type":"NON_ATTRIBUTED","conversion":{,"value_1":"000000134535321","value_3":"AAAA","value_4":"12669","value_5":"9"},"stream_type":"COOKIE"}
{"type":"NON_ATTRIBUTED","conversion":{,"value_1":"000000100357651","value_3":"WWWW","value_4":"12670","value_5":"2"},"stream_type":"COOKIE"}

The output I am getting is : type,conversion,stream_type NON_ATTRIBUTED,Array,COOKIE NON_ATTRIBUTED,Array,COOKIE

The output I am expecting is: type,conversion,value_1,value_3,value_4, value_5 ,stream_type NON_ATTRIBUTED,000000100355321, XXXX, 1267, 6, COOKIE ..

2 Answers 2

11

json_decode($json, true); converts JSON objects to associative arrays. So this

{
    "type":"NON_ATTRIBUTED",
    "conversion":{,
        "value_1":"000000100355321",
        "value_3":"XXXX",
        "value_4":"12667",
        "value_5":"6"
    },
    "stream_type":"COOKIE"
}

Become this:

array(3) { 
    ["type"]=> string(14) "NON_ATTRIBUTED" 
    ["conversion"]=> array(4) { 
        ["value_1"]=> string(15) "000000100355321" 
        ["value_3"]=> string(4) "XXXX" 
        ["value_4"]=> string(5) "12667" 
        ["value_5"]=> string(1) "6" 
    } 
    ["stream_type"]=> string(6) "COOKIE" 
}

As you see there is nested arrays. And you trying to insert all elements of array to your text file (csv is just a simple text file) with this line:

fputcsv($f, array_merge($firstLineKeys, $line));

It works nice when element of array is string. But when the element is array we got the Array to string conversion. So you must to use loop or array_merge on a nested array to prevent this.

I can't clearly understand how your csv must look like, but I hope this fix of your code will help you. If not, write a comment below.

if (empty($argv[1])) die("The json file name or URL is missed\n");
$jsonFilename = $argv[1];

$json = file_get_contents($jsonFilename);
$array = json_decode($json, true);
$f = fopen('output.csv', 'w');

$firstLineKeys = false;
foreach ($array as $line)
{
    if (empty($firstLineKeys))
    {
        $firstLineKeys = array_keys($line);
        fputcsv($f, $firstLineKeys);
        $firstLineKeys = array_flip($firstLineKeys);
    }
    $line_array = array($line['type']);
    foreach ($line['conversion'] as $value)
    {
        array_push($line_array,$value);
    }
    array_push($line_array,$line['stream_type']);
    fputcsv($f, $line_array);

}

There is also a mistake in your json - unneeded comma: "conversion":{,

Sign up to request clarification or add additional context in comments.

5 Comments

@Applejack For if(empty($firstLineKeys)) does $firstLineKeys have to be true and thus empty for the statements within the if loop to run?
@pHorseSpec This lines wasn't related to the problem itself so I didn't change it. If you need to convert some JSON to CSV you should decide the structure of your CSV file first of all because JSON and CSV is a different data structures: JSON is a tree and CSV is a table. Topic started build his own structure but he his a mistake in his code and in JSON file, so all I did is just fixed it.
Is there any possible convert multidimensional Tree structure json format into CSV format??
Hello, @AndrewSurzhynskyi, urgent help needed regarding the same. Can we chat on something, will be very helpful of you.
@SiddharthChoudhary If you have a question - go ahead and ask it here so everyone could contribute.
0

The fundamental difference between JSON and CSV is that JSON is a tree structure[0], and CSV is a table.

Visually, we want to transform something that looks like this:

{"a":{"d":0},
 "b":{"d":1,"e":2},
 "c":{"e":3}}

into something that looks like this:

+-+-+-+
| |d|e|
+-+-+-+
|a|0| |
+-+-+-+
|b|1|2|
+-+-+-+
|c| |3|
+-+-+-+

Note that the second object ("b") in the root JSON object has two values, but the first and third objects don't. JSON data may not have the same number/names of fields in each sub-object. That means we need to add empty values to the first and third JSON objects, so that when they are written to the CSV, they have the same number of columns, in the same order as the largest JSON object.

In the example above, both the "a" and "c" objects need a missing "d" or "e" value to have the same number of columns as the "b" object.

We'll need two passes. One to find the largest object, and a second to add the empty values to the smaller objects before printing out each CSV row.

json_decode[1] takes a second boolean argument, $associative, which when true decodes json objects into arrays. We'll use that to take advantage of PHP's built-in array functions.

To demonstrate with the JSON above:

$json = '{"a":{"d":0},"b":{"d":1,"e":2},"c":{"e":3}}';
$json_arr = json_decode($json, true);
var_dump($json_arr);

output:

array(3) {
  'a' =>
  array(1) {
    'd' =>
    int(0)
  }
  'b' =>
  array(2) {
    'd' =>
    int(1)
    'e' =>
    int(2)
  }
  'c' =>
  array(1) {
    'e' =>
    int(3)
  }
}

I've come up with this function to find the largest subarray in a multidimensional array:

$longest_count = 0;
$longest_k = '';

function find_largest_subarray($arr, $last_count=0, $last_k=NULL){
  global $longest_count, $longest_k;
  foreach($arr as $k => $v){
    if(is_array($v)){
      if(count($v)+$last_count > $longest_count){
        $longest_count = $last_count+count($v);
        $longest_k = $last_k ?? $k;
      }
      find_largest_subarray($v, count($v)+$last_count, $k);
    }
  }
}

It uses 2 global variables to keep track of the longest subarray found so far, and its key.

We can use that on our $json_arr from above:

find_largest_subarray($json_arr);
echo "longest count = $longest_count \n";
echo "longest k = $longest_k \n";

output:

longest count = 2 
longest k = b

So that's our first pass. We have found the largest object. We still need to add the missing values. To know which values are missing, we need to know which keys are present in the largest object, and then we'll know which keys are missing in the smaller objects. Because we now know the largest subarray, we can use this[1] SO answer to get all the keys of this (potentially) multidimensional array:

function array_keys_multi(array $array){
  $keys = array();
    foreach ($array as $key => $value) {
      $keys[] = $key;
      if (is_array($value)) {
        $keys = array_merge($keys, array_keys_multi($value));
      }
  }
  return $keys;
}

$header_columns = array_keys_multi($json_arr[$longest_k]);
var_dump($header_columns);

output:

array(2) {
  [0] =>
  string(1) "d"
  [1] =>
  string(1) "e"
}

We now have almost everything we need. We still need a way to flatten the (potentially) multidimensional array. We need to flatten the array in case the JSON data has a sub-subarray like this:

$json = '{"a":{"d":0},"b":{"d":1,"e":2, "f":{"g":4}},"c":{"e":3}}'; // added "f" object

This function flattens the array[3]:

function flatten($arr){
  $return = [];
  $it = new RecursiveIteratorIterator( new RecursiveArrayIterator($arr));
  foreach ($it as $key=>$val){
    $return[$key] = $val; 
  }
  return $return;
}

Now we have everything we need. We'll use an outer loop through the JSON array (converted with json_decode), and an inner loop through the $header_columns array to fill in the blanks if a row is missing a key.

$csv_out = (implode(',', $header_columns)) . "\r\n"; //header for CSV

foreach($json_arr as $k => $v){
  $v = flatten($v);
  foreach($header_columns as $col){
    $csv_out .= '"'. ($v[$col] ?? '') . '",'; //double-quote and add comma
  }
  $csv_out = rtrim($csv_out, ','); //remove trailing comma
  $csv_out .= "\r\n";
}

echo $csv_out;

output:

d,e
"0",""
"1","2"
"","3"

That's our CSV! Here is the code altogether, with all variables and functions declared up top, intermediate output statements removed, and the OP's (cleaned) JSON data:

<?php
$json = '[{"type":"NON_ATTRIBUTED","conversion":{"value_1":"000000100355321","value_3":"XXXX","value_4":"12667","value_5":"6"},"stream_type":"COOKIE"},'.
'{"type":"ATTRIBUTED","conversion":{"value_1":"000000167865321","value_3":"YYYY","value_4":"12668","value_5":"0"},"stream_type":"COOKIE"},'.
'{"type":"NON_ATTRIBUTED","conversion":{"value_1":"000000134535321","value_3":"AAAA","value_4":"12669","value_5":"9"},"stream_type":"COOKIE"},'.
'{"type":"NON_ATTRIBUTED","conversion":{"value_1":"000000100357651","value_3":"WWWW","value_4":"12670","value_5":"2"},"stream_type":"COOKIE"}]';
$json_arr = json_decode($json, true);
$longest_count = 0;
$longest_k = '';
$header_columns = '';
$csv_out = '';

function find_largest_subarray($arr, $last_count=0, $last_k=NULL){
  global $longest_count, $longest_k;
  foreach($arr as $k => $v){
    if(is_array($v)){
      if(count($v)+$last_count > $longest_count){
        $longest_count = $last_count+count($v);
        $longest_k = $last_k ?? $k;
      }
      find_largest_subarray($v, count($v)+$last_count, $k);
    }
  }
}

function flatten($arr){
    $return = [];
    $it = new RecursiveIteratorIterator( new RecursiveArrayIterator($arr));
    foreach ($it as $key=>$val){
        $return[$key] = $val; 
    }
    return $return;
}

function array_keys_multi(array $array){
  $keys = array();
    foreach ($array as $key => $value) {
      $keys[] = $key;
      if (is_array($value)) {
        $keys = array_merge($keys, array_keys_multi($value));
      }
  }
  return $keys;
}

find_largest_subarray($json_arr);

$header_columns .= array_keys_multi($json_arr[$longest_k]);

$csv_out .= (implode(',', $header_columns)) . "\r\n"; //header for CSV
    
foreach($json_arr as $k => $v){
  $v = flatten($v);
  foreach($header_columns as $col){
    $csv_out .= '"'. ($v[$col] ?? '') . '",'; //double-quote and add comma
  }
  $csv_out = rtrim($csv_out, ','); //remove trailing comma
  $csv_out .= "\r\n";
}

echo $csv_out;

output:

type,conversion,value_1,value_3,value_4,value_5,stream_type
"NON_ATTRIBUTED","","000000100355321","XXXX","12667","6","COOKIE"
"ATTRIBUTED","","000000167865321","YYYY","12668","0","COOKIE"
"NON_ATTRIBUTED","","000000134535321","AAAA","12669","9","COOKIE"
"NON_ATTRIBUTED","","000000100357651","WWWW","12670","2","COOKIE"

Note that the "conversion" key is included, but has an empty value for each row. This is because the "conversion" value is an array which is flattened into the other keys. One nice thing about this approach is that if some "conversion" values are arrays, and some are integers, this still works.

A warning about this approach -- if the JSON data has different objects with fields that are not a subset of fields of the largest object, those fields won't be included in the CSV.

For example:

$json = '{"a":{"f":0},"b":{"d":1,"e":2},"c":{"e":3}}';

In that case the $column_headers array won't include the field "f", so the "a" row will be empty.

References:

[0] https://en.wikipedia.org/wiki/Tree_structure

[1] https://www.php.net/manual/en/function.json-decode.php

[2] How to get all the key in multi-dimensional array in php

[3] How to Flatten a Multidimensional Array?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.