2

I am new to Powershell, so please understand.

I have this pattern

(.*?)(\d{3})(.*?:\r?\n)(?!\2)(\d{3})

to match this text:

111 is different from:
111 is different from:
123 is different from:
567.

This only gives 1 match, whereas there are 2 instances there. How can that be achieved? The pattern consumes 123 in the first instance so that it can't be found. I had to repeat the line several times to overcome this. I believe there are other ways. Please help.

Tried to change the 123 pattern into lookahead. But I couldn't capture the 123.

Goal: I want to insert a line, a sentence, between the two different values.

EDIT: like this

111 is different from:
111 is different from:
   *** This value 123 ***
123 is different from:
  *** This value 567 ***
567. 
14
  • Your pattern does not seem to match anything, see regex101.com/r/CO7siv/1 In the example data, group 1 is always empty as the string starts with 3 digits. The negative lookahead after the newline will not succeed. Can you update the question with an example of the desired result? Commented Jan 14, 2023 at 9:36
  • The character ^ indicates beginning of line and $ indicates end of line. You do not put the return characters into a Regex. So use "^\d{3}.*:$" Commented Jan 14, 2023 at 10:32
  • 3
    Please provide a better example and also show us the desired output because now it is unclear what you are trying to achieve Commented Jan 14, 2023 at 13:10
  • 1
    @Eddie Do you mean like this? ^(?=(\d{3})(.*:\r?\n)(?!\1)(\d{3})) See the 3 capture group values regex101.com/r/N0rCk2/1 Commented Jan 14, 2023 at 15:30
  • 1
    To answer the question, the -replace (regex) operator always a global replace, whether it's a string array or a raw string with line endings. Commented Jan 14, 2023 at 16:36

2 Answers 2

2

Note that PowerShell's -replace operator is invariably global, i.e. it always looks for and replaces all matches of the given regex.

Use the following -replace operation instead:

@'
111 is different from:
111 is different from:
123 is different from:
567.
'@ -replace '(?m)^(\d{3}) .+:(\r?\n)(?!\1)(?=(\d{3})\b)', 
            '$0  *** This value $3 ***$2'

Note: The @'<newline>...<newline>'@ string literal used for the multiline input text is a so-called here-string.

Output:

111 is different from:
111 is different from:
  *** This value 123 ***
123 is different from:
  *** This value 567 ***
567.
  • For a detailed explanation of the regex and the ability to experiment with it, see this regex101.com page, but in short:

    • (?m) is the inline form of the Multiline .NET regex option, which makes ^ and $ match at the start and end of each line.

    • ^(\d{3}) therefore matches a 3-digit sequence only at the start of a line, in a capture group, and .+: matches a space and at least one additional character on the same line all the way to a : at the end.

    • (\r?\n) captures the specific newline sequence encountered (which may be CRLF (Windows-format) or just LF (Unix-format)) in a 2nd capture group.

      • Capturing the specific newline sequence allows you to replicate it in the substitution string via placeholder $2, to ensure that the newly inserted line is terminated with the same sequence.

      • If you don't care about potentially mixing \r\n and \n in the resulting string, you could omit the 2nd capture group and use "`n" (sic) or "`r`n" instead, using an expandable string ("...") with an escape sequence - note that, unlike in C#, \r and \n are not recognized in PowerShell string literals (it is only the .NET regex engine that recognizes them, but not in the substitution operand of -replace, which is not a regex, and where only $-prefixed placeholders are recognized).

        # Conceptually cleaner: separate the verbatim part from
        # the expandable part.
        ('$0  *** This value $2 ***' + "`n")
        
        # Alternative, using a single "..." string
        # The '$' chars. that are part of -replace *placeholders*
        # must be *escaped as `$* to prevent up-front expansion by PowerShell
        "`$0  *** This value `$2 ***`n"
        
    • (?!\1)(?=(\d{3})\b) uses both a negative ((?!...)) and positive (?=...) lookahead assertion to look for 3 digits at the start of the next line (at a word boundary, due to \b) that aren't the same as the 3 digits on the current line (\1 being a backreference to what the 1st capture group matched).

      • Note that using a capture group inside an overall by-definition non-capturing lookaround assertion is possible, and indeed used above to capture the 3-digit sequence at the start of the subsequent line, referenced via placeholder $3 in the substitution string.
    • In the substitution string, $0, $2 and $3 refer to the what the entire regex, the 2nd capture group, and the 3rd one captured, respectively ($& may be used in lieu of $0; see this answer for more info about these placeholders).

      • Note that by using a string as the substitution operand, you are limited to embedding captured text as-is, via placeholders as such as $0 (see this answer for more info about these placeholders). If you need to determine the substitution text fully dynamically, i.e. if it needs to apply transformations based on each match:

        • In PowerShell (Core) 7+, you can use a script block { ... } instead.

        • In Windows PowerShell, you'll have to call the underlying [regex]::Replace() method directly.

      • See below.


To spell out the fully dynamic substitution approach, adding 1 to the captured number in this example:

PowerShell (Core) 7+ solution, using a script block ({ ... }) as -replace's substitution operand:

@'
111 is different from:
111 is different from:
123 is different from:
567.
'@ -replace '(?m)^(\d{3}) .+:(\r?\n)(?!\1)(?=(\d{3})\b)', {
               '{0}  *** This value + 1: {1} ***{2}' -f $_.Value, ([int] $_.Groups[3].Value + 1), $_.Groups[2].Value
            }

Windows PowerShell solution, where a direct call to the underlying [regex]::Replace() method is required:

$str = @'
111 is different from:
111 is different from:
123 is different from:
567.
'@

[regex]::Replace(
  $str, 
  '(?m)^(\d{3}) .+:(\r?\n)(?!\1)(?=(\d{3})\b)', 
  {
    param($m)
    '{0}  *** This value + 1: {1} ***{2}' -f $m.Value, ([int] $m.Groups[3].Value + 1), $m.Groups[2].Value
  }
)

Output (note that 1 has been added to each captured value):

111 is different from:
111 is different from:
  *** This value + 1: 124 ***
123 is different from:
  *** This value + 1: 568 ***
567.
Sign up to request clarification or add additional context in comments.

Comments

1

You can use 2 capture groups where you can use the first group in a negative lookahead, and the second group to get the right result after replacing.

^(\d{3})\b.*:(?=\r?\n(?!\1)(\d{3})\b)

In the replacement use the full match and group 2:

$0\n   *** This value $2 ***

See a .NET regex101 demo.

Output

111 is different from:
111 is different from:
   *** This value 123 ***
123 is different from:
   *** This value 567 ***
567.

If you want the position at the start of the string that asserts that the next line does not start with the digits at the start of the first line, the whole pattern will be in a positive lookahead assertion:

^(?=(\d{3}\b)(.*:\r?\n)(?!\1)(\d{3})\b)

See another .NET regex101 demo.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.