6

I have the following CSV files, I want to merge these into a single CSV

01.csv

apples,48,12,7
pear,17,16,2
orange,22,6,1

02.csv

apples,51,8,6
grape,87,42,12
pear,22,3,7

03.csv

apples,11,12,13
grape,81,5,8
pear,11,5,6

04.csv

apples,14,12,8
orange,5,7,9

Desired output:

apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,87,42,12,81,5,8,,,
pear,17,16,2,22,3,7,11,5,6,,,
orange,22,6,1,,,,,,5,7,9

Can anyone provide guidance on how to achieve this? Preferably using Powershell but open to alternatives like Perl if that's easier.

Thanks Pantik, your code's output is close to what I want:

apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,87,42,12,81,5,8
orange,22,6,1,5,7,9
pear,17,16,2,22,3,7,11,5,6

Unfortunately I need "placeholder" commas in place for when the entry is NOT present in a CSV file, e.g. orange,22,6,1,,,,,,5,7,9 rather than orange,22,6,1,5,7,9

UPDATE: I would like these parsed in order of the filenames, e.g.:

$myFiles = @(gci *.csv) | sort Name
foreach ($file in $myFiles){

regards ted

1
  • 1
    It looks like you want the data ordered by filenames. For example, you have empty records in orange from 2.csv and 3.csv. If this is a requirement, you should add that to the question. Commented May 13, 2011 at 8:55

6 Answers 6

4

Here is my Perl version:

use strict;
use warnings;

my $filenum = 0;

my ( %fruits, %data );
foreach my $file ( sort glob("*.csv") ) {

    $filenum++;
    open my $fh, "<", $file or die $!;

    while ( my $line = <$fh> ) {

        chomp $line;

        my ( $fruit, @values ) = split /,/, $line;

        $fruits{$fruit} = 1;

        $data{$filenum}{$fruit} = \@values;
    }

    close $fh;
}
foreach my $fruit ( sort keys %fruits ) {

    print $fruit, ",", join( ",", map { $data{$_}{$fruit} ? @{ $data{$_}{$fruit} } : ",," } 1 .. $filenum ), "\n";
}

Which gives me:

apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,,87,42,12,81,5,8,,,
orange,22,6,1,,,,,,,5,7,9
pear,17,16,2,22,3,7,11,5,6,,,

So do you have a typo for grape or i have misunderstood something?

Sign up to request clarification or add additional context in comments.

2 Comments

Your default sort will not do for filenames beyond 9.csv though, as 11.csv will come before 2.csv.
Thanks gangablass... yes in my desired output I have a typo , there should not be a space immediately after grape. Have updated.
2

Ok, gangabass solution works, and is cooler than mine, but I'll add mine anyway. It is slightly stricter, and preserves a data structure that can be used as well. So, enjoy. ;)

use strict;
use warnings;

opendir my $dir, '.' or die $!;
my @csv = grep (/^\d+\.csv$/i, readdir $dir);
closedir $dir;
# sorting numerically based on leading digits in filename
@csv = sort {($a=~/^(\d+)/)[0] <=> ($b=~/^(\d+)/)[0]} @csv;

my %data;

# To print empty records we first need to know all the names
for my $file (@csv) {
    open my $fh, '<', $file or die $!;
    while (<$fh>) {
        if (m/^([^,]+),/) {
            @{ $data{$1} } = ();
        }
    }
    close $fh;
}

# Now we can fill in values
for my $file (@csv) {
    open my $fh, '<', $file or die $!;
    my %tmp;
    while (<$fh>) {
        chomp;
        next if (/^\s*$/);
        my ($tag,@values) = split (/,/);
        $tmp{$tag} = \@values;
    }
    for my $key (keys %data) {
        unless (defined $tmp{$key}) {
            # Fill in empty values
            @{$tmp{$key}} = ("","","");
        }
        push @{ $data{$key} }, @{ $tmp{$key} };
    }
}

&myreport; 

sub myreport {
    for my $key (sort keys %data) {
        print "$key," . (join ',', @{$data{$key}}), "\n";
    }
}

Comments

2

Powershell:

$produce = "apples","grape","orange","pear"
$produce_hash = @{}
$produce | foreach-object {$produce_hash[$_] = @(,$_)}

$myFiles = @(gci *.csv) | sort Name
 foreach ($file in $myFiles){ 
    $file_hash = @{}
    $produce | foreach-object {$file_hash[$_] = @($null,$null,$null)}
        get-content $file | foreach-object{
            $line = $_.split(",")
            $file_hash[$line[0]] = $line[1..3]
            }
    $produce | foreach-object {
        $produce_hash[$_] += $file_hash[$_]
        }
  }

$ofs = ","
$out = @()
$produce | foreach-object {
 $out += [string]$produce_hash[$_]
 }

$out | out-file "outputfile.csv" 

gc outputfile.csv

apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,,87,42,12,81,5,8,,,
orange,22,6,1,,,,,,,5,7,9
pear,17,16,2,22,3,7,11,5,6,,,

Should be easy to modify for additional items. Just add them to the $produce array.

3 Comments

thanks mjolinor, can this modified so that you don't need to manually enter the items in the $produce array... as it may not be known in advance what the items will be...
Possible. 2 ways I see to do that: 1- read through the data twice, using the first pass to collect the unique values of the first element to build the $produce array. 2 - set up a counter and increment as you process each file so you know you may arrays of $nulls you need to add before the first set of values you got for that item. Which one works best may depend on how many / how big your data files are.
Posted a second solution that auto-populates $produce
2

Second Powershell solution (as requested)

   $produce = @()
   $produce_hash = @{}
    $file_count = -1
    $myFiles = @(gci 0*.csv) | sort Name
     foreach ($file in $myFiles){ 
        $file_count ++
        $file_hash = @{}
                get-content $file | foreach-object{
                $line = $_.split(",")

                if ($produce -contains $line[0]){
                    $file_hash[$line[0]] += $line[1..3]
                    }

                else {
                    $produce += $line[0]
                    $file_hash[$line[0]] = @(,$line[0]) + (@($null) * 3 *  $file_count) + $line[1..3]
                    }

                  }
              $produce | foreach-object { 
                if ($file_hash[$_]){$produce_hash[$_] += $file_hash[$_]} 
                else {$produce_hash[$_] += @(,$null) * 3}
               }

    }          

    $ofs = ","
    $out = @()
    $produce_hash.keys | foreach-object {
     $out += [string]$produce_hash[$_]
     }

    $out | out-file "outputfile.csv" 

    gc outputfile.csv
apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,,87,42,12,81,5,8,,,
orange,22,6,1,,,,,,,5,7,9
pear,17,16,2,22,3,7,11,5,6,,,

Comments

1

you have to parse the files, I don't see easier way hot to do it

solution in powershell:

UPDATE: ok, adjusted a bit - hopefully understandable

$items = @{}
$colCount = 0 # total amount of columns
# loop through all files
foreach ($file in (gci *.csv | sort Name))
{
    $content = Get-Content $file
    $itemsToAdd = 0; # columns added by this file
    foreach ($line in $content)
    {
        if ($line -match "^(?<group>\w+),(?<value>.*)") 
        { 
            $group = $matches["group"]
            if (-not $items.ContainsKey($group)) 
            {   # in case the row doesn't exists add and fill with empty columns
                $items.Add($group, @()) 
                for($i = 0; $i -lt $colCount; $i++) { $items[$group] += "" }
            }

            # add new values to correct row
            $matches["value"].Split(",") | foreach { $items[$group] += $_ }
            $itemsToAdd = ($matches["value"].Split(",") | measure).Count # saves col count
        } 
    }

    # in case that file didn't contain some row, add empty cols for those rows
    $colCount += $itemsToAdd
    $toAddEmpty = @()
    $items.Keys | ? { (($items[$_] | measure).Count -lt $colCount) } | foreach { $toAddEmpty += $_ }
    foreach ($key in $toAddEmpty) 
    {   
        for($i = 0; $i -lt $itemsToAdd; $i++) { $items[$key] += "" }
    }
}

# output
Remove-Item "output.csv" -ea 0
foreach ($key in $items.Keys)
{
    "$key,{0}" -f [string]::Join(",", $items[$key]) | Add-Content "output.csv"
}

Output:

apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,,87,42,12,81,5,8,,,
orange,22,6,1,,,,,,,5,7,9
pear,17,16,2,22,3,7,11,5,6,,,

1 Comment

Thanks PantikT for this effort, much appreciated - please see my update to the my question for feedback, as this does not quite generate the output I'm looking for.
0

Here is a more consise way how to do it. However, it still doesn't add the commas when the item is missing.

Get-ChildItem D:\temp\a\ *.csv | 
    Get-Content |
    ForEach-Object -begin { $result=@{} } -process {
        $name, $otherCols = $_ -split '(?<=\w+),'
        if (!$result[$name]) { $result[$name] = @() }
        $result[$name] += $otherCols
    } -end {
        $result.GetEnumerator() | % {
            "{0},{1}" -f $_.Key, ($_.Value -join ",")
        }
    } | Sort

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.