This can be done relatively easy in PHP, no need for Python's Pandas.
a.csv:
905,bla,meh?,na3
16,meh,meh?,ni2
4,bla,meh?,put1
b.csv:
4,info,meh1
16,info,meh2
905,info,meh3
<?php
// load both files to array
$a = array_map('str_getcsv', file('a.csv'));
$b = array_map('str_getcsv', file('b.csv'));
var_dump($a, $b);
// index both arrays by the column you want to merge by
$mergeColumn = 0;
$a = array_combine(array_column($a, $mergeColumn), $a);
$b = array_combine(array_column($b, $mergeColumn), $b);
var_dump($a, $b);
// construct combined array by combining same indexes from both arrays
// and merging the values (skip merge column from second file so it is not doubled)
$c = [];
foreach ($a as $k => $dataA) {
unset($b[$k][$mergeColumn]);
$c[$k] = array_merge($dataA, $b[$k]);
}
var_dump($c);
// put it to output csv file
$fp = fopen('c.csv', 'w');
foreach ($c as $row) {
fputcsv($fp, $row);
}
fclose($fp);
Output:
// a.csv as array
array(3) {
[0]=>
array(4) {
[0]=>
string(3) "905"
[1]=>
string(3) "bla"
[2]=>
string(4) "meh?"
[3]=>
string(3) "na3"
}
[1]=>
array(4) {
[0]=>
string(2) "16"
[1]=>
string(3) "meh"
[2]=>
string(4) "meh?"
[3]=>
string(3) "ni2"
}
[2]=>
array(4) {
[0]=>
string(1) "4"
[1]=>
string(3) "bla"
[2]=>
string(4) "meh?"
[3]=>
string(4) "put1"
}
}
// b.csv as array
array(3) {
[0]=>
array(3) {
[0]=>
string(1) "4"
[1]=>
string(4) "info"
[2]=>
string(4) "meh1"
}
[1]=>
array(3) {
[0]=>
string(2) "16"
[1]=>
string(4) "info"
[2]=>
string(4) "meh2"
}
[2]=>
array(3) {
[0]=>
string(3) "905"
[1]=>
string(4) "info"
[2]=>
string(4) "meh3"
}
}
// a.csv keyed by merge column
array(3) {
[905]=>
array(4) {
[0]=>
string(3) "905"
[1]=>
string(3) "bla"
[2]=>
string(4) "meh?"
[3]=>
string(3) "na3"
}
[16]=>
array(4) {
[0]=>
string(2) "16"
[1]=>
string(3) "meh"
[2]=>
string(4) "meh?"
[3]=>
string(3) "ni2"
}
[4]=>
array(4) {
[0]=>
string(1) "4"
[1]=>
string(3) "bla"
[2]=>
string(4) "meh?"
[3]=>
string(4) "put1"
}
}
// b.csv keyed by merge column
array(3) {
[4]=>
array(3) {
[0]=>
string(1) "4"
[1]=>
string(4) "info"
[2]=>
string(4) "meh1"
}
[16]=>
array(3) {
[0]=>
string(2) "16"
[1]=>
string(4) "info"
[2]=>
string(4) "meh2"
}
[905]=>
array(3) {
[0]=>
string(3) "905"
[1]=>
string(4) "info"
[2]=>
string(4) "meh3"
}
}
// combined array
array(3) {
[905]=>
array(6) {
[0]=>
string(3) "905"
[1]=>
string(3) "bla"
[2]=>
string(4) "meh?"
[3]=>
string(3) "na3"
[4]=>
string(4) "info"
[5]=>
string(4) "meh3"
}
[16]=>
array(6) {
[0]=>
string(2) "16"
[1]=>
string(3) "meh"
[2]=>
string(4) "meh?"
[3]=>
string(3) "ni2"
[4]=>
string(4) "info"
[5]=>
string(4) "meh2"
}
[4]=>
array(6) {
[0]=>
string(1) "4"
[1]=>
string(3) "bla"
[2]=>
string(4) "meh?"
[3]=>
string(4) "put1"
[4]=>
string(4) "info"
[5]=>
string(4) "meh1"
}
}
and c.csv (outcome):
905,bla,meh?,na3,info,meh3
16,meh,meh?,ni2,info,meh2
4,bla,meh?,put1,info,meh1
Above solution works if both files have the same identifiers that merging is performed on (first column)
If there can be a row that doesnt have a match in other csv file then another approach is needed.
a.csv:
A,B,C,D
905,bla,meh?,na3
16,meh,meh?,ni2
4,bla,meh?,put1
1,a,b,c
b.csv:
A,Z,Y
4,info,meh1
16,info,meh2
905,info,meh3
2,d,e
a.csv contains a row with A: 1, there is no match for that in b.csv. b.csv contains a row with A: 2, there is no match for that in a.csv.
We expect result csv to have 5 rows (4, 16, 905 - common + 1 with empty Z,Y + 2 with empty B,C,D).
code:
<?php
$a = parseCsv('a.csv');
$b = parseCsv('b.csv');
$allHeaders = array_unique(array_merge($a['header'], $b['header']));
$mergeColumn = 'A';
$a['rows'] = array_combine(array_column($a['rows'], $mergeColumn), $a['rows']);
$b['rows'] = array_combine(array_column($b['rows'], $mergeColumn), $b['rows']);
$allIndexes = array_unique(array_merge(array_column($a['rows'], $mergeColumn), array_column($b['rows'], $mergeColumn)));
$c = [];
foreach ($allIndexes as $index) {
$row = [];
foreach ($allHeaders as $header) {
$row[$header] = '';
if (isset($a['rows'][$index][$header])) {
$row[$header] = $a['rows'][$index][$header];
} elseif (isset($b['rows'][$index][$header])) {
$row[$header] = $b['rows'][$index][$header];
}
}
$c[$index] = $row;
}
var_dump($c);
$fp = fopen('c.csv', 'w');
fputcsv($fp, $allHeaders);
foreach ($c as $row) {
fputcsv($fp, $row);
}
fclose($fp);
function parseCsv(string $file): array {
$rows = array_map('str_getcsv', file($file));
$header = array_shift($rows);
$csv = [];
foreach($rows as $row) {
$csv[] = array_combine($header, $row);
}
return ['header' => $header, 'rows' => $csv];
}
Output:
array(5) {
[905]=>
array(6) {
["A"]=>
string(3) "905"
["B"]=>
string(3) "bla"
["C"]=>
string(4) "meh?"
["D"]=>
string(3) "na3"
["Z"]=>
string(4) "info"
["Y"]=>
string(4) "meh3"
}
[16]=>
array(6) {
["A"]=>
string(2) "16"
["B"]=>
string(3) "meh"
["C"]=>
string(4) "meh?"
["D"]=>
string(3) "ni2"
["Z"]=>
string(4) "info"
["Y"]=>
string(4) "meh2"
}
[4]=>
array(6) {
["A"]=>
string(1) "4"
["B"]=>
string(3) "bla"
["C"]=>
string(4) "meh?"
["D"]=>
string(4) "put1"
["Z"]=>
string(4) "info"
["Y"]=>
string(4) "meh1"
}
[1]=>
array(6) {
["A"]=>
string(1) "1"
["B"]=>
string(1) "a"
["C"]=>
string(1) "b"
["D"]=>
string(1) "c"
["Z"]=>
string(0) ""
["Y"]=>
string(0) ""
}
[2]=>
array(6) {
["A"]=>
string(1) "2"
["B"]=>
string(0) ""
["C"]=>
string(0) ""
["D"]=>
string(0) ""
["Z"]=>
string(1) "d"
["Y"]=>
string(1) "e"
}
}
c.csv:
905,bla,meh?,na3,info,meh3
16,meh,meh?,ni2,info,meh2
4,bla,meh?,put1,info,meh1
1,a,b,c,,
2,,,,d,e
$csv1[3]and$csv2[0]are only ever going to give you one value. You need to keep one file open and during iteration of that bring in the other file and search for your value. This will be heavy on resources. Likely this should be put into a DB, or stored in a DB as a starting point. With DB this would just bejoin table using(A)