1

We have a program that creates email signatures and stores them in a deployment folder that is then saved to the users local folder when they log in. However when the employee is not assigned to an office, the comma separator for City/State still come along for the ride as shown in this example:

Example Email signature

Problem is the program source code cannot be found. Long term I will rewrite it. Short term I need a powershell script that will run every night to remove the line containing the commas. Found the following solution here on Stackoverflow:

Get-ChildItem C:\temp\emailsigs -Filter *.htm | Foreach-Object{
(Get-Content $_.FullName) | 
Foreach-Object {$_ -replace " ,   &nbsp; ,   &nbsp; <br />", ""} | 
Set-Content $_.FullName
}

This actually works pretty well. But I notice that each signature HTM file (over 1100) is getting the timestamp update even when only 2 email signatures need to have the empty comma line removed. Is there a more efficient way to first check if the file contains the offending commas to then replace and skip over the majority?

2 Answers 2

2

The following PSv5+ solution won't be memory-efficient, but should speed up processing while avoiding rewriting of files that don't need it:

Get-ChildItem C:\temp\emailsigs -Filter *.htm |
  ForEach-Object {
    $oldContent = Get-Content -Raw $_.FullName
    $newContent = $oldContent -replace ' ,   &nbsp; ,   &nbsp; <br />'
    if ($newContent.Length -lt $oldContent.Length) { # was a replacement performed?
      Set-Content $_.FullName -NoNewline -Value $newContent
    }
  }
  • -Raw is PSv3+ and reads the entire file as a single string.

    • In PSv2, you could use [System.IO.File]::ReadAllText() instead, but note that it assumes UTF-8 as the encoding in the absence of a BOM, whereas Get-Content assumes "ANSI" encoding[1] (the system's legacy "ANSI" code page), so you may have to specify the encoding explicitly.
  • Processing each file as a single string speeds up processing (though each file must fit into memory twice). Taking advantage of -replace leaving an input string unmodified if the regex doesn't match, we can compare the length of the original contents to the length of the replaced contents to see if something matched and that the file therefore needs rewriting.
    Thus, we only need a single regex operation per file.

    • Also note that ... -replace '...' - i.e., not specifying a replacement string - is equivalent to ... -replace '...', '', i.e., to effectively remove what was matched.
  • -NoNewline requires PSv5+; it prevents an additional newline from getting appended on output.

    • In PSv4-, you could use [System.IO.File]::WriteAllText() instead, but note that its default encoding is UTF-8 without a BOM, whereas Set-Content, like Get-Content, defaults to "ANSI" encoding[1].

[1] The above applies to Windows PowerShell. The cross-platform PowerShell Core edition defaults to (BOM-less) UTF-8 as well.

Sign up to request clarification or add additional context in comments.

4 Comments

I confirmed to htm file is not in use by another program. I suspect this script process keeps the file locked. So I had to award the second solution as it works.
@user3195770: Yes, it was the Get-Content in my original answer that kept each input file locked, preventing its replacement. However, the revised version that I posted a few hours ago avoids that problem by using Get-ChildItem. It still has the advantage of matching the file content only once and being generally faster due to processing the files as single strings (also, I just added another optimization to only compare the content length to determine if a replacement was made).
Cool, I will attempt to test in a few. Wish I could award both answers.
@user3195770: Understood: Generally, if multiple answers solve your problem equally (hopefully not in the exact same manner, because that would make them duplicates), I suggest accepting the one most likely to help future readers. To show appreciation for the others, you'd normally up-vote them, but don't have enough reputation yet (requires >= 15).
0

Other method

Get-ChildItem C:\temp\emailsigs -file -Filter *.htm | foreach{

$CurrentFile=$_

$Content=Get-Content $CurrentFile.FullName -Encoding UTF8

if ($Content -like '* ,   &nbsp; ,   &nbsp; <br />*')
{
    $Content.Replace(' ,   &nbsp; ,   &nbsp; <br />', '') | Set-Content $CurrentFile.FullName -Encoding UTF8
}

}

I use utf8 for keep diacritics

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.