1

I have a file, input.txt, containing text like this:

GRP123456789
    123456789012
GRP234567890
    234567890123
GRP456789012
    "A lot of text. More text. Blah blah blah: Foobar." (Source Error) (Blah blah blah)
GRP567890123
    Source Error
GRP678901234
    Source Error
GRP789012345
    345678901234
    456789012345

I'm attempting to capture all occurrences of "GRP#########" on the condition that at least one number is on the next line.

So GRP123456789 is valid, but GRP456789012 and GRP678901234 are not.

The RegEx pattern I came up with on http://regexstorm.net/tester is: (GRP[0-9]{9})\s\n\s+[0-9]

The PowerShell script I have so far, based off this site http://techtalk.gfi.com/windows-powershell-extracting-strings-using-regular-expressions/, is:

$input_path = 'C:\Users\rtaite\Desktop\input.txt'
$output_file = 'C:\Users\rtaite\Desktop\output.txt'

$regex = '(GRP[0-9]{9})\s\n\s+[0-9]'

select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Values } > $output_file

I'm not getting any output, and I'm not sure why.

Any help with this would be appreciated as I'm just trying to understand this better.

6
  • Try (?m)^GRP[0-9]{9}(?=\r?\n\s+[0-9]) Commented Dec 12, 2016 at 20:53
  • @WiktorStribiżew Nope, still no output. Commented Dec 12, 2016 at 20:55
  • Try the above one with a correction of $_.Values to $_.Value. Also, try replacing $_.Values with $_.Groups[1].Value and try with your own regex (it should work if your file has CRLF endings). Commented Dec 12, 2016 at 20:56
  • @WiktorStribiżew Unfortunately no, still no output. Would Value only give me one result, instead of multiple with Values? Or is Values simply invalid? I tried with and without Groups and both forms of Value, and for hell of it tried setting the Group index to 0. Nothing. Commented Dec 12, 2016 at 20:58
  • There is no Values property, only Value. You will get all occurrences because you ask to fetch -AllMatches. Commented Dec 12, 2016 at 20:59

3 Answers 3

2

You need to turn the text input into a single string before passing it to Select-String, otherwise the cmdlet will operate on each line individually and thus never find a match.

Get-Content $input_path | Out-String |
    Select-String $regex -AllMatches |
    Select-Object -Expand Matches |
    ForEach-Object { $_.Groups[1].Value } |
    Set-Content $output_file

If you're using PowerShell v3 or newer you can replace Get-Content | Out-String with Get-Content -Raw.

Sign up to request clarification or add additional context in comments.

1 Comment

Both methods you mentioned worked with my original pattern, thank you!
1

To strip strings from a text file using a pattern, then the best tool for the job is the Select-String. This is also has a parameter called -Context which lets you capture lines before or after the matched line, ideal for just this problem.

So my solution would be something like this:

Select-String 'input.txt' -Pattern '^GRP[0-9]{9}' -Context 0, 1 | ? {
    $_.Context.PostContext -match '\d'
} | Select -ExpandProperty line | Set-Content 'output_file.txt'

Comments

0

Using

[regex]::Matches($(Get-Content '.\Desktop\new 1.txt'), "GRP\d+(?=\s+\d)") |
    % { $_.value | Out-File .\Desktop\new-1-matches.txt -Append }

I achieved the following output from your sample file:

GRP123456789
GRP234567890
GRP789012345

1 Comment

You may want to put the Out-File after the ForEach-Object to avoid repeatedly appending to the output file.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.