2

I am trying to build a simple script to utilize regex and match multiple patterns on a single line - recursively throughout an input file, and write the result to an output file. But I'm hitting a wall:

Sample text:

BMC12345 COMBINED PHASE STATISTICS:  31 ROWS SELECTED FOR SPACE 'KDDT111D.DIH0345S', 0 ROWS SELECTED BUT DISCARDED DUE TBMC123456 COMBINED PHASE STATISTICS:  10 PHYSICAL (10 LOGICAL) RECORDS DISCARDED TO SYSDISC

Here's what I've got so far:

$table = [regex] "'.*'"
$discard = [regex] "\d* PHYSICAL"

Select-String -Pattern ($table, $discard) -AllMatches .\test.txt | foreach {
    $_.Matches.Value
} > output.txt

Output:

'KDDT111D.DIH0345S'

Desired output:

'KDDT111D.DIH0345S' 10 Physical

For some reason I am unable to get both patterns to write to output.txt. Ideally once I get this working I would like to use Export-Csv to get something a bit cleaner like:

|KDDT111D|DIH0345S|10 Physical|
1

3 Answers 3

1

i think you will find the -match operator a bit more suited to this. [grin] using named matches against your sample stored in $InStuff, this ...

$InStuff -match ".+SPACE '(?<Space>.+)\.(?<SubSpace>.+)'.+: (?<Discarded>.+) \(.+"

... gives the following set of matches ...

Name                           Value                                                                              
----                           -----                                                                              
Space                          KDDT111D                                                                           
SubSpace                       DIH0345S                                                                           
Discarded                      10 PHYSICAL                                                                        
0                              BMC12345 COMBINED PHASE STATISTICS: 31 ROWS SELECTED FOR SPACE 'KDDT111D.DIH0345...

the named matches can be addressed by $Matches.<the capture group name>.

Sign up to request clarification or add additional context in comments.

4 Comments

I don't believe this is the direction I want to go, but I appreciate the idea. The input file will have hundreds of records similar to the sample text I provided above. I would like to have the script pick up the input file from a specified directory. The script should then recursively extract the 'Space', 'SubSpace' and 'Discarded' match values, and write that output for each record to a txt/csv file.
@Jonmonjovi - ok ... but regex is usually the way to go for large files. [grin] i cannot get the Select-String cmdlet to work with an array of patterns such as you listed. i suspect it will not work at all that way. [sigh ...]
Unfortunately I have been unable to replicate your results using the match operator and my input file. PS version info below: PSVersion 5.1.17134.407 PSEdition Desktop PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...} BuildVersion 10.0.17134.407 CLRVersion 4.0.30319.42000 WSManStackVersion 3.0 PSRemotingProtocolVersion 2.3 SerializationVersion 1.1.0.1
there are so many ways a regex can go wrong [grin] ... it's nearly meaningless to discuss it without seeing both the actual code you used AND a few lines of the actual [sanitized] data file.
1

You have run into a Select-String limitation: The .Matches property of the [Microsoft.PowerShell.Commands.MatchInfo] objects that Select-String emits for each input object (line) only ever contains the (potentially multiple) matches for the first regex passed to the
-Pattern parameter.[1]

You can work around the problem by passing a single regex instead, by combining the input regexes via alternation (|):

Select-String -Pattern ($table, $discard -join '|') -AllMatches .\test.txt | 
  ForEach-Object { $_.Matches.Value } > output.txt

A simplified example:

# ('f.', '.z' -join '|') -> 'f.|.z'
'foo bar baz' | Select-String -AllMatches ('f.', '.z' -join '|') |
  ForEach-Object { $_.Matches.Value }

The above yields:

fo
az

proving that the matches for both regexes were reported.

Caveat re output ordering: Using alternation (|) causes the matches for a given input string to be reported in the order in which they're found in the input, not in the order in which the regexes were specified.
That is, both -Pattern 'f.|.z' and -Pattern '.z|f.' above would have resulted in the same output order.


[1] The problem exists as of Windows PowerShell v5.1 / PowerShell Core 6.2.0-preview.4 and is discussed in this GitHub issue

Comments

0

Thanks to the contributors for the ideas and learning experience. I was able to get the desired output utilizing a combination of both answers receive.

I found that the -match operator only returned the first occurrence of the regex pattern match from the source file, so I needed to add a foreach loop in order to recursively return matches throughout the log file.

I also modified the regex to include only discard values greater than 0.

Sample Text:

BMC51472I COMBINED PHASE STATISTICS:  0 ROWS SELECTED FOR SPACE 'KDDT000D.KDAICH0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  3499604 ROWS SELECTED FOR SPACE 'KDDT000D.KDAIND0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  1 ROWS SELECTED FOR SPACE 'KDDT000D.KDCISR0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  9185775 ROWS SELECTED FOR SPACE 'KDDT000D.KDIADR0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  11 PHYSICAL (11 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  0 ROWS SELECTED FOR SPACE 'KDDT000D.KDICHT0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  2387375 ROWS SELECTED FOR SPACE 'KDDT000D.KDICMS0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  1632821 ROWS SELECTED FOR SPACE 'KDDT000D.KDIPRV0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  0 ROWS SELECTED FOR SPACE 'KDDT000D.KDLADD0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  24845 PHYSICAL (24845 LOGICAL) RECORDS DISCARDED TO SYSDISC

Example:

  $regex = ".+SPACE '(?<Space>.+)\.(?<SubSpace>.+)'.+: (?<Discarded>.+) .[1-9][0-9]*\s\b"

    $timestamp = Get-Date
    $timestamp = Get-Date $timestamp -f "MM_dd_yy"
    $dir = "C:\Users\JonMonJovi\"

    cat $dir\*.log.txt | where {
        $_ -match $regex
    } | foreach {
        $Matches.Space, $Matches.SubSpace, $Matches.Discarded -join "|"
    } > C:\Users\JonMonJovi\Discarded\Discard_Log_$timestamp.txt

Output:

KDDT000D|KDIADR0S| 11 PHYSICAL
KDDT000D|KDLADD0S| 24845 PHYSICAL

From here I am able to use the pipe delimited .txt output file to import into Excel, fulfilling my requirements.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.