2

I have a SSIS Script Task written in C# and I want it ported to powershell to be used as a script. The C# version runs in 12.1s, but the powershell version takes 100.5s almost an order of magnitude slower. I'm processing 11 text files (csv) with about 3-4 million rows in each of the format:

<TICKER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>
AUDJPY,20010102,230100,64.30,64.30,64.30,64.30,4
AUDJPY,20010102,230300,64.29,64.29,64.29,64.29,4
<snip>

I want to simply write out the contents to a new file where the column has a date of 20110101 or later. Here's my C# version:

    private void ProcessFile(string fileName)
    {
        string outfile = fileName + ".processed";
        StringBuilder sb = new StringBuilder();
        using (StreamReader sr = new StreamReader(fileName))
        {
            string line;
            int year;
            while ((line = sr.ReadLine()) != null)
            {
                year = Convert.ToInt32( sr.ReadLine().Substring(7, 4));
                if (year >= 2011)
                {
                    sb.AppendLine(sr.ReadLine());
                }
            }
        }

        using (StreamWriter sw = new StreamWriter(outfile))
        {
            sw.Write(sb.ToString());
        }
    }

Here's my powershell version:

foreach($file in ls $PriceFolder\*.txt) {
    $outFile = $file.FullName + ".processed"
    $sr = New-Object System.IO.StreamReader($file)
    $sw = New-Object System.IO.StreamWriter($outFile)
    while(($line = $sr.ReadLine() -ne $null))
    {       
        if ($sr.ReadLine().SubString(7,4) -eq "2011") {$sw.WriteLine($sr.ReadLine())}
    }   
}

How can I get the same performance in powershell that I can get in my C# Script Task in SSIS?

1
  • Just curious, do you intend to call ReadLine() three times inside both loop examples? Looks like it'll skip one line, match the second, print the third, and then repeat. Commented Aug 3, 2011 at 20:31

3 Answers 3

2

You cannot get PowerShell performance comparable to C# unless you actually use C# right in PowerShell. The Add-Type cmdlet allows to compile some usually trivial C# snippets and call them right from scripts. If performance is an issue and use of C# assemblies is not possible for some reasons then I would go this way.

See examples here: http://go.microsoft.com/fwlink/?LinkID=135195

Sign up to request clarification or add additional context in comments.

Comments

1

Some time ago I saw an question and tried to answer it - look at http://social.technet.microsoft.com/Forums/en/winserverpowershell/thread/da36e346-887f-4456-b908-5ad4ddb2daa9. Frankly, the performance penalty when using PowerShell was so huge that for time consuming tasks I would always choose either C# or Add-Type as @Roman suggested.

Comments

1

You are translating the C# to Powershell which might not be ideal in all the cases. Yes, using C# will give you improved performance, but it does not mean that you cannot get comparative performance with Powershell as well.

You should try and take advantage of "streaming" in Powershell pipelines.

For example, something like:

gc file.txt | ?{ process.....} | %{process...} | out-file out.txt

Would be faster as the objects are passed along the pipeline as soon as they are available.

Can you try out an equivalent using Get-Content and the pipelining?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.