2

As this solution was adapted from a well-voted answer elsewhere, I was not expecting to encounter problems.

Problem: I want to LEFT JOIN file0.csv with file1.csv.

file0.csv

+-----------------+-------------+--------------+
| Manufacturer ID |    image    | description  |
+-----------------+-------------+--------------+
| SKU231          | image1.jpg  | A box.       |
| SKUAG1          | image22.jpg | Another box. |
| SKU21D          | image7a.png | A third box. |
+-----------------+-------------+--------------+

file1.csv:

+--------+--------+--------+-------+-------+
|  mpn   | length | height | width | title |
+--------+--------+--------+-------+-------+
| SKU231 |     22 |     14 |    10 | Box 1 |
| SKUAG1 |     12 |      6 |     6 | Box 2 |
| SKU21D |      5 |      8 |     5 | Box 3 |
+--------+--------+--------+-------+-------+

Desired result (file2.csv):

+-----------------+-------------+--------------+--------+--------+--------+-------+-------+
| Manufacturer ID |    image    | description  |  mpn   | length | height | width | title |
+-----------------+-------------+--------------+--------+--------+--------+-------+-------+
| SKU231          | image1.jpg  | A box.       | SKU231 |     22 |     14 |    10 | Box 1 |
| SKUAG1          | image22.jpg | Another box. | SKUAG1 |     12 |      6 |     6 | Box 2 |
| SKU21D          | image7a.png | A third box. | SKU21D |      5 |      8 |     5 | Box 3 |
+-----------------+-------------+--------------+--------+--------+--------+-------+-------+

PHP function to LEFT JOIN file0.csv with file1.csv on Manufacturer ID and mpn respectively:

function my_csv_join(array $csv_input_file_array, $csv_output_file, $html_preview, $left_join_on, $right_join_on = NULL) {
    if (count($csv_input_file_array) > 2) {
        echo 'This function probably only works for 2 csv files being joined at a time.  Reapply iteratively if needed.  Exiting now.';
        exit;
    } else {
        for ($x = 0; $x <= 1; $x++) {
            //get csv file to array 
            ${'file' . $x} = fopen($csv_input_file_array[$x], "r"); //Dynamic variables using braces: https://stackoverflow.com/a/9257536/9095603
    
            while (!feof(${'file' . $x})) {
                ${'csv_array' . $x}[] = fgetcsv(${'file' . $x});
            }
            fclose(${'file' . $x});
    
            ${'csv_array' . $x} = array_filter(${'csv_array' . $x}); // gets rid of last empty array in case present
            //CREATE HEADERED ARRAY
            ${'header' . $x} = array_shift(${'csv_array' . $x});
            foreach (${'csv_array' . $x} as ${'product_data_array' . $x} {
                ${'headered_array' . $x}[] = array_combine(${'header' . $x}, ${'product_data_array' . $x});
            }        
        }
        // How to simulate the SQL LEFT JOIN operation using PHP arrays?  https://stackoverflow.com/a/25837426

        //function to simulate the left join
        $left = $headered_array0;
        $right = $headered_array1;
        $final = array();

        if (empty($right_join_on)) // if $right_join_on omitted implies $right_join_on has the same value as $left_join_on
            $right_join_on = $left_join_on;
        foreach ($left AS $k => $v) {
            $final[$k] = $v; //basically keeping everything; $left just becomes $final
            foreach ($right AS $kk => $vv) {
                if ($v[$left_join_on] == $vv[$right_join_on]) { 
                    foreach ($vv AS $key => $val)
                        $final[$k][$key] = $val; 
                } else {
                    foreach ($vv AS $key => $val)
                   $final[$k][$key] = NULL; 
                }
            }
        }
        if ($html_preview == 'y') {
            echo '<pre>';
                var_dump($final);
            echo '</pre>';
        }
    
        // REINSTATE HEADERS
        // var_dump($final[0]);
        $indented_header = array(
            0 => array_keys($final[0])
        );
        $re_headered_array = array_merge($indented_header, $final);
    
        // write array to csv file
        $file2 = fopen($csv_output_file, "w");
        foreach ($re_headered_array as $line) {
            fputcsv($file2, $line);
        }
    
        fclose($file2);
    }
}

Call to function my_csv_join():

my_csv_join(array('C:\xampp\htdocs\kalugi\file0.csv','C:\xampp\htdocs\kalugi\file1.csv'), 'C:\xampp\htdocs\kalugi\file2.csv','y','Manufacturer ID','mpn');

Actual result shows not all records matching on the file0."Manufacturer ID" = file1.mpn are being matched. Thus some expected rows that satisfied the join condition were not joined. We have NULL in place of their values:

var_dump of result:

array(3) {
  [0]=>
  array(8) {
    ["Manufacturer ID"]=>
    string(6) "SKU231"
    ["image"]=>
    string(10) "image1.jpg"
    ["description"]=>
    string(6) "A box."
    ["mpn"]=>
    NULL
    ["length"]=>
    NULL
    ["height"]=>
    NULL
    ["width"]=>
    NULL
    ["title"]=>
    NULL
  }
  [1]=>
  array(8) {
    ["Manufacturer ID"]=>
    string(6) "SKUAG1"
    ["image"]=>
    string(11) "image22.jpg"
    ["description"]=>
    string(12) "Another box."
    ["mpn"]=>
    NULL
    ["length"]=>
    NULL
    ["height"]=>
    NULL
    ["width"]=>
    NULL
    ["title"]=>
    NULL
  }
  [2]=>
  array(8) {
    ["Manufacturer ID"]=>
    string(6) "SKU21D"
    ["image"]=>
    string(11) "image7a.png"
    ["description"]=>
    string(12) "A third box."
    ["mpn"]=>
    string(6) "SKU21D"
    ["length"]=>
    string(1) "5"
    ["height"]=>
    string(1) "8"
    ["width"]=>
    string(1) "5"
    ["title"]=>
    string(5) "Box 3"
  }
}

Result as written out to $file2 (file2.csv):

    +-----------------+-------------+--------------+--------+--------+--------+-------+-------+
    | Manufacturer ID |    image    | description  |  mpn   | length | height | width | title |
    +-----------------+-------------+--------------+--------+--------+--------+-------+-------+
    | SKU231          | image1.jpg  | A box.       |        |        |        |       |       |
    | SKUAG1          | image22.jpg | Another box. |        |        |        |       |       |
    | SKU21D          | image7a.png | A third box. | SKU21D |      5 |      8 |     5 | Box 3 |
    +-----------------+-------------+--------------+--------+--------+--------+-------+-------+

Why are 2 of the rows not joined despite file0."Manufacturer ID" = file1.mpn being satisfied?

1
  • Debug questions require a minimal reproducible example. PS Don't blindly use code & don't dump code that doesn't work; ask about the 1st place that the code doesn't do what you expect, with justification referencing competent documentation. How to Ask Help center PS Two other answers where you got that code say the code you took needs correction. Commented Aug 16 at 11:34

2 Answers 2

0

I'm going to simplify this down to an SSCCE so it's easy to understand and reproduce.

I think you're over-complicating the solution here which is probably what's causing you so many weird bugs. The solution is actually rather simple. All you need to do is take each csv file, parse it into rows and columns, and then merge each of those rows into one row to write it back out to another csv.

This is the simplest way to do it:

$file0 = <<<'FILE0'
Manufacturer ID,image,description
SKU231,image1.jpg,A box.
SKUAG1,image22.jpg,Another box.
SKU21D,image7a.png,A third box.
FILE0;

$file1 = <<<'FILE1'
mpn,length,height,width,title
SKU231,22,14,10,Box 1
SKUAG1,12,6,6,Box 2
SKU21D,5,8,5,Box 3
FILE1;

$csv1 = array_map('str_getcsv', explode("\n", $file0));
$csv2 = array_map('str_getcsv', explode("\n", $file1));

$fd = fopen("/tmp/test.csv", 'w');


foreach ($csv1 as $row => $columns) {

    $newRow = array_merge($columns, $csv2[$row]);
    fputcsv($fd, $newRow);

}

$fd = fopen("/tmp/test.csv", "r");

while (($line = fgets($fd)) !== false) {
    echo $line;
}

The result:

"Manufacturer ID",image,description,mpn,length,height,width,title
SKU231,image1.jpg,"A box.",SKU231,22,14,10,"Box 1"
SKUAG1,image22.jpg,"Another box.",SKUAG1,12,6,6,"Box 2"
SKU21D,image7a.png,"A third box.",SKU21D,5,8,5,"Box 3"

|Manufacturer ID | image       | description  | mpn    | length | height | width | title |
|----------------|-------------|--------------|--------|--------|--------|-------|-------|
| SKU231         | image1.jpg  | A box.       | SKU231 | 22     | 14     | 10    | Box 1 |
| SKUAG1         | image22.jpg | Another box. | SKUAG1 | 12     | 6      | 6     | Box 2 |
| SKU21D         | image7a.png | A third box. | SKU21D | 5      | 8      | 5     | Box 3 |
Sign up to request clarification or add additional context in comments.

6 Comments

I like the simplicity of your solution and will try to re-write my code to incorporate it, but do you have any idea what was wrong in the original code? I spent 2 days trying to debug it and I'm itching to know what was wrong...
A lot of things. That code is riddled with bugs. Which is why I found it counter intuitive to even attempt to fix it.
Thanks. If there's any really key mistakes you saw in my original code, I'm happy to have them pointed out, as I'm learning PHP right now.
I didn't see any one key mistake. I just saw a whole bunch of mistakes that would be rather unhelpful to the Stack Overflow community at large, to correct here. I find that what's more helpful is to address the solution to the given problem rather than spend this time as debugging exercise or code review session. I'm sorry if that isn't helpful to you, but I'm just trying to adhere to what's best for the community at large.
To be clear this answer does NOT merge data related by Manufacturer ID / mpn. It relates the lines from the two csv files purely on their line position. This answer will NOT be reliable unless the lines are 100% guaranteed to be order with match manufacturers on every line.
|
0

I have not tested this script on an actual server, but I expect it to handle any number of csv files where a single unique identifying column can relate separate datasets.

Effectively, my snippet below will open and parse csv data, then assign first level keys based on the passed in column index which relates the two datasets. If a dataset has an identifying value which has been encountered before, then the rows of data can be merged.

A gentle reminder: when you think variable variables are a good idea, it is probably time for a code review and refactor.

$csvs = [
    ['file1.csv', 0],
    ['file2.csv', 0]
];

$result = [];
foreach ($csvs as [$filename, $idIndex]) {
    $file = fopen($filename, 'r');
    if ($file) {
        $headers = fgetcsv($file);
        while (($line = fgetcsv($file)) !== false) {
            $result[$line[$idIndex]] = array_merge(
                $result[$line[$idIndex]] ?? [],
                array_combine($headers, $line)
            );
        }
        fclose($file);
    }
}
var_export($result);

For anyone who wants to simply append csv data as new columns of another csv's data (not "join" by a shared column value), the following can append data from a variable number of CSVs and retain column integrity even if not all columns are populated in earlier CSVs (assuming all files have full header/first rows). Demo

$handles = ['file1.csv', 'file2.csv', 'file3.csv', 'file4.csv'];

$result = [];
foreach ($handles as $handle) {
    $padToColIndex = count($result[0] ?? []);
    $rowIndex = 0;
    while (($row = fgetcsv($handle, escape: '')) !== false) {
        $result[$rowIndex] = array_pad($result[$rowIndex] ?? [], $padToColIndex, '');
        array_push($result[$rowIndex++], ...$row);
    }
    fclose($handle);
}

var_export($result);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.