0

There are some good responses to the question of how to use Powershell to merge multiple CSV files into one, dropping the header row on all but the first file in this thread.
Kemiller2002's posted answer work well for me in most cases, however when the output file exceeds 2GB I start getting out of memory exception errors. The following error message gets thrown...

Exception of type 'System.OutOfMemoryException' was thrown.
At xxx.ps1:9 char:20
+            $false {$lines | Select -Skip 1}
+                    ~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (:) [], OutOfMemoryException
    + FullyQualifiedErrorId : System.OutOfMemoryException  

This is using Powershell 5.1. It doesn't seem to be an issue with the MaxMemoryPerShellMB (that reports to be 2147483647), nor does it appear to be an issue with actual system memory - the last time I ran this there were 33GB of free memory (out of 64GB total) left.

Now the script keeps running, and adding to the file (one of my final files winds up being around 7GB in size), but I can't be certain that it has captured every line in all the files when I see this error popping up.

Any suggestions?

EDIT

I added some output so I could see where the errors are occurring. I'm appending 11 files, varying in size from around 350 MB to 1GB... it is the two files which are around 1GB that cause the errors. One has a reported length of 909,050,983 and the other is 973,429,260.

2
  • Well, the low hanging fruit would be to make sure that you are running a 64 bit powershell instance and not a 32 bit instance. Commented Jan 2, 2018 at 16:40
  • Yup, sorry, I should have mentioned that it is running in 64 bit Commented Jan 2, 2018 at 22:28

3 Answers 3

2

It's simply nuts the way that some people do this using this approach...

Get-Content $SrcFile1, $SrcFile2 | Set-Content $DstFile

Dont do it that way! It's ridiculously slow and invariably leads to memory exception errors. Instead use the good old file copy from the command processor e.g...

cmd /c "copy $($SrcFile1) + $($SrcFile2) $($DstFile)"

Sign up to request clarification or add additional context in comments.

1 Comment

this is fast but adds a wierd character at the end
1

I don't have a large file to test this with, but using .net methods might be an alternative as you can just work on 1 row at a time rather than loading the whole thing in to memory.

$filepath = "c:\temp"
$outputfile = "c:\temp\output\result.csv"
$encoding = [System.Text.Encoding]::UTF8

$files = Get-ChildItem -Path $filePath -Filter *.csv

$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)

$skiprow = $false
foreach ($file in $files)
{
    $r = New-Object System.IO.StreamReader($file.fullname, $encoding)
    while (($line = $r.ReadLine()) -ne $null) 
    {
        if (!$skiprow)
        {
            $w.WriteLine($line)
        }
        $skiprow = $false
    }
    $r.Close()
    $r.Dispose()
    $skiprow = $true
}

$w.close()
$w.Dispose()

8 Comments

I'll give that a shot tomorrow.
... but I think that would be very slow. Might be more efficient to simply manually edit out the headers with UltraEdit, then copy the files together, and add back in a header row. DOH! Of course I could script those manual components too - copy the header from the first file into a temp file, strip the headers off all the files, copy them together into a temp body file, then merge the temp header file and the temp body file.
whoa... actually just started up your script with the intention of just letting it run overnight, and it is proving to be far more efficient than the other script I was using. the other script took around 40 minutes to process the 11 files, and here it has only been 10 minutes, and your script is 75% done. Chewed through the two big files without hesitation.
Sorry, I would up vote your answer to mark it as the answer but I don't have enough reputation.
How would I set the encoding options in there? Accents in my source file are all getting messed up. I know there is a way to set the encoding for StreamWriter, but I'm uncertain how to do it in the way you are invoking it.
|
0

A complete answer evolved from the great point Bob made

<###########################
user config section
###########################>

# location of files to concatenate
$sourcefolder = "P:\DWH\ntm_v1\uncompressed"

# source file extension
$ext = "*.csv"

# output folder (best to have new folder for safety)
$outfolder = $sourcefolder + "\..\concatenated"

#output file name
$outfilename = "concat.txt"

<###########################
do work
###########################>

# build full path to out file
$concatfile = $outfolder + "\" + $outfilename

#create output folder
md -Force $outfolder

# delete output file if exists
if (Test-Path $concatfile) 
{
  Remove-Item -Confirm $concatfile
}

ForEach ($file in (Get-ChildItem -Path $sourcefolder -Filter $ext)) {
    $param = "type $file >> $concatfile"
    Write-Host "cmd /c $param"
        
    # run concat command
    cmd /c $param;  
}

1 Comment

Who not use notepad++? Just wondering

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.