2

I have created and tested this Regexpattern <\w\w:Value> SYMBOL: (P.*)=(.*)\/\/(.*)

 Regex regexPattern = new Regex(@"<\w\w:Value> SYMBOL: (P.*)=(.*)\/\/(.*)");
 var attributeChecker = regexPattern.Match(line);
 var attributeLongDescription = attributeChecker.Groups[3].ToString().Trim();

Here is the model:

<AC:Value> SYMBOL: PDWFNA     = 0;        // Projektierung D-Weg Freimeldung nicht
                                          // auswerten
<AC:Value> SYMBOL: PDWLE      = 0;        // Länge des Durchrutschweges

The results that I am getting, from group three are:

Projektierung D-Weg Freimeldung nicht
Länge des Durchrutschweges

How can I get these results from Group three:

Projektierung D-Weg Freimeldung nicht auswerten
Länge des Durchrutschweges
10
  • You cannot do that, you need to match all the lines below your pattern match that are continuation of the comment, and then post-process the result. Commented Oct 15, 2021 at 12:27
  • @WiktorStribiżew Could you please give me an example, how should I do that ? Commented Oct 15, 2021 at 12:28
  • I was working on the code but you have got an answer already. Commented Oct 15, 2021 at 12:31
  • @WiktorStribiżew If you have a better answer you can post yours and I will remove mine. Commented Oct 15, 2021 at 12:31
  • 1
    @AdrianHHH Now, the question is 1) different, 2) unclear as there is no input text, no sample to test the pattern against. kn1ghtx, please keep the question as it was, and if there is just a slight issue with the current solution, please drop a comment below the answer(s) and if it is a bigger thing, please consider asking a new question. Rolling back to the latest normal question for the time being. Commented Oct 18, 2021 at 21:45

2 Answers 2

1

You cannot capture disjoint parts of a string into a single capturing group. You need to match all the lines below your pattern match that are continuation of the comment, and then post-process the result.

You can use the following approach (see the C# demo):

var text = @"<AC:Value> SYMBOL: PDWFNA     = 0;        // Projektierung D-Weg Freimeldung nicht
                                          // auswerten
<AC:Value> SYMBOL: PDWLE      = 0;        // Länge des Durchrutschweges";
var matches = Regex.Matches(text, @"<\w{2}:Value> SYMBOL: (P.*)=(.*)//(.*(?:\n[\s-[\r\n]]*//.*)*)");
foreach (Match m in matches) 
{
    Console.WriteLine("--- A new match ---");
    Console.WriteLine($"Group 1: {m.Groups[1].Value}");
    Console.WriteLine($"Group 2: {m.Groups[2].Value}");
    Console.WriteLine("Group 3: {0}",
        string.Join(" ", 
            m.Groups[3].Value.Split(new[] {"//"}, StringSplitOptions.RemoveEmptyEntries)
                .Select(x => x.Trim())
        )
    );
}

Output:

--- A new match ---
Group 1: PDWFNA     
Group 2:  0;        
Group 3: Projektierung D-Weg Freimeldung nicht auswerten
--- A new match ---
Group 1: PDWLE      
Group 2:  0;        
Group 3: Länge des Durchrutschweges

See also the regex demo.

The (.*(?:\n[\s-[\r\n]]*//.*)*) part captures into Group 3 the rest of the current line with .*, then any zero or more lines that can start with zero or more whitespaces other than CR and LF, then have // and then anything till the end of the line.

The string.Join(" ", m.Groups[3].Value.Split(new[] {"//"}, StringSplitOptions.RemoveEmptyEntries).Select(x => x.Trim())) is one way of post-processing Group 3 value. Here, it is split with // substring and then all the resulting items get stripped from leading/trailing whitespace and then they are joined into a single string with a space.

You may also use Regex.Replace(m.Groups[3].Value, @"\s*//\s*", " ") instead to make it shorter.

Sign up to request clarification or add additional context in comments.

8 Comments

Just FYI: if the comment lines can contain blank lines in between replace [\s-[\r\n]] with just \s.
Nice, is [\s-[\r\n]] the same as [\p{Zs}\t]?
@Thefourthbird [\s-[\r\n]] is more equal to [^\S\r\n]. [\p{Zs}\t] is more specific.
@kn1ghtx You probably want to exclude the comments that are followed with aterisks, try <\w{2}:Value> SYMBOL: (P.*)=(.*)//(?!\s*\*)(.*(?:\n[\s-[\r\n]]*//(?!\s*\*).*)*)
@kn1ghtx Of course not. \n in the regex pattern needs to find an LF char, and when you read line by line, there is no LF in the input.
|
1

After the matching, you can process the match of group 3, removing the leading newline, the spaces and //

<\w\w:Value> SYMBOL: (P[^=\n]*)=(.*?)//(.*(?:\n[\p{Zs}\t]*//.*)*)

The pattern matches:

  • <\w\w:Value> SYMBOL: Match literally
  • (P[^=\n]*) Capture group 1, match P followed by not = or a newline
  • = Match literally
  • (.*?) Capture group 2, match any char except a newline non greedy
  • // Match literally
  • ( Capture group 3
    • .* Match the rest of the line
    • (?: Non capture group
      • \n[\p{Zs}\t]*//.* Match a newline, optional spaces and // and the rest of the line
    • )* Close
  • ) Close group 3

.NET regex demo | C# demo

For example, printing only group 3 after the replacement:

string pattern = @"<\w\w:Value> SYMBOL: (P[^=\n]*)=(.*?)//(.*(?:\n[\p{Zs}\t]*//.*)*)";
string input = @"<AC:Value> SYMBOL: PDWFNA     = 0;        // Projektierung D-Weg Freimeldung nicht
                                            // auswerten
    <AC:Value> SYMBOL: PDWLE      = 0;        // Länge des Durchrutschweges"; 
        
            
foreach (Match match in Regex.Matches(input, pattern))
{
    Console.WriteLine(Regex.Replace(match.Groups[3].Value, @"\r?\n[\p{Zs}\t]+//",""));              
}

Output

Projektierung D-Weg Freimeldung nicht auswerten
Länge des Durchrutschweges

2 Comments

\r? is unnecessary, . already matches CR symbols. // can also start at the line beginning.
thanks, but with that I am getting // Projektierung D-Weg Freimeldung nicht // auswerten not Projektierung D-Weg Freimeldung nicht auswerten

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.