Regex Pattern that gets string from two line

Question

I have created and tested this Regexpattern <\w\w:Value> SYMBOL: (P.*)=(.*)\/\/(.*)

 Regex regexPattern = new Regex(@"<\w\w:Value> SYMBOL: (P.*)=(.*)\/\/(.*)");
 var attributeChecker = regexPattern.Match(line);
 var attributeLongDescription = attributeChecker.Groups[3].ToString().Trim();

Here is the model:

<AC:Value> SYMBOL: PDWFNA     = 0;        // Projektierung D-Weg Freimeldung nicht
                                          // auswerten
<AC:Value> SYMBOL: PDWLE      = 0;        // Länge des Durchrutschweges

The results that I am getting, from group three are:

Projektierung D-Weg Freimeldung nicht
Länge des Durchrutschweges

How can I get these results from Group three:

Projektierung D-Weg Freimeldung nicht auswerten
Länge des Durchrutschweges

You cannot do that, you need to match all the lines below your pattern match that are continuation of the comment, and then post-process the result. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Oct 15, 2021 at 12:27
@WiktorStribiżew Could you please give me an example, how should I do that ? — user15519784
– user15519784, Commented Oct 15, 2021 at 12:28
I was working on the code but you have got an answer already. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Oct 15, 2021 at 12:31
@WiktorStribiżew If you have a better answer you can post yours and I will remove mine. — The fourth bird
– The fourth bird, Commented Oct 15, 2021 at 12:31
@AdrianHHH Now, the question is 1) different, 2) unclear as there is no input text, no sample to test the pattern against. kn1ghtx, please keep the question as it was, and if there is just a slight issue with the current solution, please drop a comment below the answer(s) and if it is a bigger thing, please consider asking a new question. Rolling back to the latest normal question for the time being. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Oct 18, 2021 at 21:45

Wiktor Stribiżew · Accepted Answer · 2021-10-15 12:46:26Z

1

You cannot capture disjoint parts of a string into a single capturing group. You need to match all the lines below your pattern match that are continuation of the comment, and then post-process the result.

You can use the following approach (see the C# demo):

var text = @"<AC:Value> SYMBOL: PDWFNA     = 0;        // Projektierung D-Weg Freimeldung nicht
                                          // auswerten
<AC:Value> SYMBOL: PDWLE      = 0;        // Länge des Durchrutschweges";
var matches = Regex.Matches(text, @"<\w{2}:Value> SYMBOL: (P.*)=(.*)//(.*(?:\n[\s-[\r\n]]*//.*)*)");
foreach (Match m in matches) 
{
    Console.WriteLine("--- A new match ---");
    Console.WriteLine($"Group 1: {m.Groups[1].Value}");
    Console.WriteLine($"Group 2: {m.Groups[2].Value}");
    Console.WriteLine("Group 3: {0}",
        string.Join(" ", 
            m.Groups[3].Value.Split(new[] {"//"}, StringSplitOptions.RemoveEmptyEntries)
                .Select(x => x.Trim())
        )
    );
}

Output:

--- A new match ---
Group 1: PDWFNA     
Group 2:  0;        
Group 3: Projektierung D-Weg Freimeldung nicht auswerten
--- A new match ---
Group 1: PDWLE      
Group 2:  0;        
Group 3: Länge des Durchrutschweges

8 Comments

Wiktor Stribiżew Over a year ago

Just FYI: if the comment lines can contain blank lines in between replace [\s-[\r\n]] with just \s.

The fourth bird Over a year ago

Nice, is [\s-[\r\n]] the same as [\p{Zs}\t]?

Wiktor Stribiżew Over a year ago

@Thefourthbird [\s-[\r\n]] is more equal to [^\S\r\n]. [\p{Zs}\t] is more specific.

Wiktor Stribiżew Over a year ago

@kn1ghtx You probably want to exclude the comments that are followed with aterisks, try <\w{2}:Value> SYMBOL: (P.*)=(.*)//(?!\s*\*)(.*(?:\n[\s-[\r\n]]*//(?!\s*\*).*)*)

Wiktor Stribiżew Over a year ago

@kn1ghtx Of course not. \n in the regex pattern needs to find an LF char, and when you read line by line, there is no LF in the input.

|

The fourth bird · Accepted Answer · 2021-10-15 13:15:58Z

1

After the matching, you can process the match of group 3, removing the leading newline, the spaces and //

<\w\w:Value> SYMBOL: (P[^=\n]*)=(.*?)//(.*(?:\n[\p{Zs}\t]*//.*)*)

The pattern matches:

<\w\w:Value> SYMBOL: Match literally
(P[^=\n]*) Capture group 1, match P followed by not = or a newline
= Match literally
(.*?) Capture group 2, match any char except a newline non greedy
// Match literally
( Capture group 3
- .* Match the rest of the line
- (?: Non capture group
  - \n[\p{Zs}\t]*//.* Match a newline, optional spaces and // and the rest of the line
- )* Close
) Close group 3

.NET regex demo | C# demo

For example, printing only group 3 after the replacement:

string pattern = @"<\w\w:Value> SYMBOL: (P[^=\n]*)=(.*?)//(.*(?:\n[\p{Zs}\t]*//.*)*)";
string input = @"<AC:Value> SYMBOL: PDWFNA     = 0;        // Projektierung D-Weg Freimeldung nicht
                                            // auswerten
    <AC:Value> SYMBOL: PDWLE      = 0;        // Länge des Durchrutschweges"; 
        
            
foreach (Match match in Regex.Matches(input, pattern))
{
    Console.WriteLine(Regex.Replace(match.Groups[3].Value, @"\r?\n[\p{Zs}\t]+//",""));              
}

Output

Projektierung D-Weg Freimeldung nicht auswerten
Länge des Durchrutschweges

edited Oct 15, 2021 at 13:15

answered Oct 15, 2021 at 12:28

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

2 Comments

Wiktor Stribiżew Over a year ago

\r? is unnecessary, . already matches CR symbols. // can also start at the line beginning.

user15519784 Over a year ago

thanks, but with that I am getting // Projektierung D-Weg Freimeldung nicht // auswerten not Projektierung D-Weg Freimeldung nicht auswerten

Collectives™ on Stack Overflow

Regex Pattern that gets string from two line

2 Answers 2

8 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related