0

I am trying to make a script that takes an XML file, looks for a matching condition, if it finds it adds a new line of asteriks, then when done going through the file to strip it of all its XML tags and leave the data in a plain text file.

The script has been tested on a small input xml file and works fine, but when I pass a large XML file to it takes forever (not actually sure how long as I ran it for over an hour and still no result so I just stopped it).

I'm guessing I must be performing the work in an extremely inefficient manner, hoping you guys can help me make it fast and efficient.

Here is the script below:

# Takes input XML File, cleans up XML elements, outputs plain text file

$FileName = "C:\Users\someguy\Desktop\input.xml"
$Pattern = "ProcessSpecifier = ""true"""  
$FileOriginal = Get-Content $FileName

[String[]] $FileModified = @() 
Foreach ($Line in $FileOriginal)
{   
    $FileModified += $Line
    if ($Line -match $Pattern) 
    {
        #Add Lines after the selected pattern 
        $FileModified += "*************isActive=true*****************"      
    } 
}


$FileModified -replace "<[^>]+>", "" | Out-File C:\Users\someguy\Desktop\Output.txt
3
  • += $Line creates a new array every time you call it. try ArrayList. Commented Dec 5, 2017 at 21:16
  • 1
    Which version of PowerShell are you targeting? Commented Dec 5, 2017 at 21:39
  • Can you give a rough estimate of the size of files we are talking about? Commented Dec 6, 2017 at 8:49

2 Answers 2

3

Let's go with a look behind and a bunch of regex to speed things up here. Also, I'm not going to store the whole thing in memory, I'm just going to pass it down the pipeline, which should help. I remove whitespace from the beginning and ends of lines, and filter out blank lines, but you can remove that bit if you want.

# Takes input XML File, cleans up XML elements, outputs plain text file

$FileName = "C:\Users\someguy\Desktop\input.xml"
$Pattern = '(?<=^.*ProcessSpecifier = "true".*$)'
(Get-Content $FileName) -replace $Pattern, "`n*************isActive=true*****************" -replace '<[^>]+?>' -replace '^\s*|\s$' | ?{$_} | Set-Content C:\Users\someguy\Desktop\Output.txt

So, the main thing here is that I use a look behind to find your pattern text, and then add a new line and the asterisk line to that line. So that the line

    <SomeTag>ProcessSpecifier = "true"</SomeTag>

becomes:

    <SomeTag>ProcessSpecifier = "true"</SomeTag>`n*************isActive=true*****************

When used inside double quote a backtick ` followed by n creates a new line, so the '*************isActive=true*****************' is on its own line immediately following your search pattern line. Past that I remove the XML tags, and then any leading or trailing whitespace from any line.

After the RegEx replacements I pass the result to a Where statement that removes blank lines, and then pass the remaining lines to Set-Content which I've seen better performance out of than Out-File.

Sign up to request clarification or add additional context in comments.

Comments

1

Variation of TheMadTechnician's answer:

# Takes input XML File, cleans up XML elements, outputs plain text file

$FileName = "C:\Users\someguy\Desktop\input.xml"
$Pattern = '(?<=^.*ProcessSpecifier = "true".*$)'
Set-Content -Path C:\Users\someguy\Desktop\Output.txt -Value (((Get-Content $FileName) -replace $Pattern, "`n*************isActive=true*****************" -replace '<[^>]+?>' -replace '^\s*|\s$').Where{$_})

I actually try to avoid the pipeline, it is rather slow afaik. Of course you will run into problem with memory consumption if the files are very large. The "().Where" construct doesn't work on all powershell versions (Version 4+ iirc).

This is a guess, I am not sure whether this is actually faster than TheMadTechnician's. I'd be curious about the result :)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.