1

I am trying to use RegEx to get blocks of data from a multi-line string.

String to search

***** a.txt
17=xxx
570=N
55=yyy
***** b.TXT
17=XXX
570=Y
55=yyy
*****

***** a.txt
38=10500.000000
711=1
311=0000000006630265
***** b.TXT
38=10500.000000
311=0000000006630265
*****

What I need - anything between ***** block

17=xxx
570=N
55=yyy

17=XXX
570=Y
55=yyy

38=10500.000000
711=1
311=0000000006630265

38=10500.000000
311=0000000006630265

My code so far

Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.MultiLine = True
objRegEx.IgnoreCase = True
objRegEx.Pattern = "\*\*\*\*\*(?:.|\n|\r)*?\*\*\*\*\*"
Set strMatches = objRegEx.Execute(objExec.StdOut.ReadAll())
If strMatches.Count > 0 Then
    For Each strMatch In strMatches
        Wscript.Echo strMatch
    Next
End If
Set objRegEx = Nothing

2 Answers 2

3

You need to turn the last * matching part of your consuming pattern into a positive lookahead. Also, it is highly recommendable to get rid of the (.|\r|\n)*? since it slows down the matching process, use [\s\S]*? instead.

Use

\*{5}(?!\s*\*{5}).*[\r\n]+([\s\S]*?)(?=\*{5})

and grab the first item in Submatches. With .*[\r\n]+, I advise to skip the rest of the ***** starting line.

Details:

  • \*{5} - 5 asterisks
  • (?!\s*\*{5}) - fail the match if there are 0+ whitespaces followed with 5 asterisks
  • .*[\r\n]+ - match the rest of the line with line breaks
  • ([\s\S]*?) - Capturing group 1 (its value is stored in Submatches property of the Match object) matching any 0+ chars as few as posssible up to the first....
  • (?=\*{5}) - location followed with 5 asterisks that are not consumed, just their presence is checked.

See the regex demo

If you unroll the regex, it will look uglier, but it is much more efficient:

\*{5}(?!\s*\*{5}).*[\r\n]+([^*]*(?:\*(?!\*{4})[^*]*)*)

See another regex demo

VBS code:

Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.Pattern = "\*{5}(?!\s*\*{5}).*[\r\n]+([^*]*(?:\*(?!\*{4})[^*]*)*)"
Set strMatches = objRegEx.Execute(objExec.StdOut.ReadAll())
If strMatches.Count > 0 Then
    For Each strMatch In strMatches
        Wscript.Echo strMatch.Submatches(0)
    Next
End If
Set objRegEx = Nothing
Sign up to request clarification or add additional context in comments.

8 Comments

Thanks Wiktor. Couple of issues with this patter - 1) The headers are also extracted (for e.g. ***** a.txt). I don't need that. 2) It also matches the blank line between ***** & ***** a.txt.
Ok, the 2) is true, but it is easy to handle with a lookahead (see my updated answer). As for 1), I said you must access the first item in the strMatch.Submatches(0). Note you do not need to set objRegEx.MultiLine = True.
When I am using the second regex, I am also getting ***** at the end which I don't get when I use the first regex.
There is no way you can have them in the capturing group. See regex101.com/r/ss6Xux/5. The (?:\*(?!\*{4})[^*]*)* part only matches * that are not followed with 4 *s.
Yeah, that was my mistake - apparently there were 2 blank newlines towards the end which was messing it up. I trimmed the string and now it is working. i now just need to work towards removing the headers in each match.
|
2

Just capture the sets of consecutive numbered lines

Option Explicit

Dim data
    With WScript.CreateObject("WScript.Shell")
        data = .Exec("fc.exe /n 1.txt 2.txt").StdOut.ReadAll()
    End With 

Dim match
    With New RegExp
        .Pattern = "(?:^[ ]*[0-9].*?$[\r\n]+)+"
        .Global = True
        .MultiLine = True
        For Each match in .Execute( data )
            WScript.StdOut.WriteLine "---------------------------------------"
            WScript.StdOut.WriteLine match.Value
        Next 
    End With 

3 Comments

It works only if there are no lines not starting with digits and will also match any text before the first ***** if the lines start with digits. It does not check if the data is in between the asterisks.
There will never be any line starting with digit in my scenario. And I too am actually parsing the information i get via fc (building a vbs solution to match difference between 2 text files).
@WiktorStribiżew, You are right, the regexp is completly adapted to the format of the data, but I play with the advantage (sorry) of knowing that the data being processed is the output of a fc /n command.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.