56

I am trying to remove all the lines from a text file that contains a partial string using the below PowerShell code:

 Get-Content C:\new\temp_*.txt | Select-String -pattern "H|159" -notmatch | Out-File C:\new\newfile.txt

The actual string is H|159|28-05-2005|508|xxx, it repeats in the file multiple times, and I am trying to match only the first part as specified above. Is that correct? Currently I am getting empty as output.

Why is this failing and how can I resolve it?

8 Answers 8

63

Suppose you want to write that in the same file, you can do as follows:

Set-Content -Path "C:\temp\Newtext.txt" -Value (get-content -Path "c:\Temp\Newtext.txt" | Select-String -Pattern 'H\|159' -NotMatch)
Sign up to request clarification or add additional context in comments.

3 Comments

This is exactly what I wanted. Thx @Samselvaprabu!
For large textfiles, this method is quite slow, Do you have any ideas how to improve the perfomance?
@Gora for debugging perf, simplifying the regex or using simpler string matching might help. Otherwise, I'd abandon powershell (who knows when powershell buffers vs. streams) and try native system commands like grep or findstr.exe which are plenty fast.
32

Escape the | character using a backtick

get-content c:\new\temp_*.txt | select-string -pattern 'H`|159' -notmatch | Out-File c:\new\newfile.txt

3 Comments

Warning - I used this to attempt to update a file in-place and the file was deleted.
Long lines get sliced with Out-File, I resolved by using Set-Content instead, same syntax
Out-File adds empty lines to the output (for non matching lines), but Set-Content doesn't, which I guess is indeed the desired behaviour.
8

You don't need Select-String in this case, just filter the lines out with Where-Object

Get-Content C:\new\temp_*.txt |
    Where-Object { -not $_.Contains('H|159') } |
    Set-Content C:\new\newfile.txt

String.Contains does a string comparison instead of a regex so you don't need to escape the pipe character, and it's also faster

3 Comments

I like this solution over Fourkeys' because (unless I'm an idiot) Select-String also adds file name and line number to the output, which isn't desired in my use case.
@tolache I don't see that behaviour with Select-String here (PS 5).
@Fuujuhi you don't get filenames and line numbers if you pass the input strings through a pipe like above, but normally Select-String pattern file.txt will output file name and line numbers by default as you can see from the man page
8

Another option for writing to the same file, building on the existing answers. Just add brackets to complete the action before the content is sent to the file.

(get-content c:\new\sameFile.txt | select-string -pattern 'H`|159' -notmatch) | Set-Content c:\new\sameFile.txt

4 Comments

In my tests, the backets do not change anything in the output produced, which makes sense. However using Out-File instead of Set-Content adds empty lines.
Thanks for the tip on Out-File, I have updated it to Set-Content. Without the brackets it would be writing to the file at the same time it is reading from it (in this one line example). The brackets force the read operation to complete before it starts writing to it.
Ok! This enforces sequential access, and emulates in-place edition of the file. Now it's clear, good tip!
Cleanest solution imho: read -> filter -> write, and applying parentheses to enforce the order of execution to be able to write to the same file. Thanks a lot, just learned something new.
4

The pipe character | has a special meaning in regular expressions. a|b means "match either a or b". If you want to match a literal | character, you need to escape it:

... | Select-String -Pattern 'H\|159' -NotMatch | ...

2 Comments

In PowerShell, the escape character is the backtick (`). See About Escape Characters.
@orad I am aware of that. In regular expressions, however, the escape character is the backslash. Both work in this case.
3

This is probably a long way around a simple problem, it does allow me to remove lines containing a number of matches. I did not have a partial match that could be used, and needed it to be done on over 1000 files. This post did help me get to where I needed to, thank you.

$ParentPath = "C:\temp\test"
$Files = Get-ChildItem -Path $ParentPath -Recurse -Include *.txt
$Match1 = "matchtext1"
$Match2 = "matchtext2"
$Match3 = "matchtext3"
$Match4 = "matchtext4"
$Match5 = "matchtext5"
$Match6 = "matchtext6"
$Match7 = "matchtext7"
$Match8 = "matchtext8"
$Match9 = "matchtext9"
$Match10 = "matchtext10"

foreach ($File in $Files) {
    $FullPath = $File | % { $_.FullName }
    $OldContent = Get-Content $FullPath
    $NewContent = $OldContent `
    | Where-Object {$_ -notmatch $Match1} `
    | Where-Object {$_ -notmatch $Match2} `
    | Where-Object {$_ -notmatch $Match3} `
    | Where-Object {$_ -notmatch $Match4} `
    | Where-Object {$_ -notmatch $Match5} `
    | Where-Object {$_ -notmatch $Match6} `
    | Where-Object {$_ -notmatch $Match7} `
    | Where-Object {$_ -notmatch $Match8} `
    | Where-Object {$_ -notmatch $Match9} `
    | Where-Object {$_ -notmatch $Match10}
    Set-Content -Path $FullPath -Value $NewContent
    Write-Output $File
}

2 Comments

Thanks for the solution! It looks like you've missed the $Match1 declaration though, and in testing this locally it appears to be adding a blank line onto every file at the end
Thank you for that, I had $Match2 twice, hence missing $Match1. As for the line at the end, not something I looked into as it does not affect the usability of my files. If I find a way of removing it I'll drop it in comments
0

If you anyone having this issue while doing what suggested by Robert Brooker-

*These files have different encodings. Left file: Unicode (UTF-8) with signature. Right file: Unicode (UTF-8) without signature. You can resolve the difference by saving the right file with the encoding Unicode (UTF-8) with signature.* with Set-Content

use -Encoding UTF8

so like this

(get-content c:\new\sameFile.txt | select-string -pattern 'H`|159' -notmatch) | Set-Content c:\new\sameFile.txt -Encoding UTF8

1 Comment

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.
0

I would personally prefer ForEach-Object for faster operation, Since it avoids the use of Select-String

$inputFile = "C:\new\temp_*.txt"
$outputFile = "C:\new\newfile.txt"

Get-Content $inputFile | ForEach-Object {
 if ($_ -notmatch "H\|159") {
    $_
  }
} | Set-Content $outputFile

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.