1

I have several regular expression blocks that parse a c++ file for certain info. I'm trying to change my regex so that it avoids the commented blocks. The code that still captures the commented block is:

Function Get-CaseContents{
  [cmdletbinding()]
  Param ( [string]$parsedCaseMethod, [string]$basePathFull)
  Process
  {
      # split into separate "case" blocks.
      # (the funky "(?=...)" preserves the delimiter)
      $blocks = $parsedCaseMethod -split "(?=case (.*):)";

      $pattern = `
      "_stprintf[\s\S]*?_T\D*" +
      "(?<sdkErr>[x\d]+)" +
      "\D[\s\S]*?" +
      "\((?<sdkDesc>(.+?)`")\)" +
      "[\s\S]*?" +
      "(outError\s*=\s*(?<sdkOutErr>[a-zA-Z_0-9]*))" +
      "[\s\S]*?" +
      "(?<sdkSeverity>outSeverity\s*=\s[a-zA-Z_]*)";

      # note - skip first block as it's the preamble before the first "if"
      $result = $blocks `
      | select-object -skip 1 `
      | select-string -pattern $pattern `
      | foreach-object {
          $match = $_.Matches[0];
          $tmp_removeParen = $match.Groups['sdkDesc'] -replace '\(|\)|%s|\"',"."
          [PSCustomObject] [ordered] @{
              "sdkErr"      = $($match.Groups['sdkErr'])
              "sdkDesc"     = $($tmp_removeParen)
              "sdkOutErr"   = $($match.Groups['sdkOutErr'])
              "sdkSeverity" = ($match.Groups['sdkSeverity'] -split '_')[-1]
          }
      };
      return $result 



     
  }#End of Process
}#End of Function 

That gets all of the targeted contents plus the commented blocks, which I want to avoid. The c++ code that is being parsed looks like this:

        case kRESULT_STATUS_SHORTAGE:  
            _stprintf(outDevStr, _T("2000 - (Shortage issue) - %s(Shortage)"), errorStr);
            outError = HOP_SHORTAGE;
            outSeverity = CCC_INFORMATION;
            break;


// New Error codes(really old errors broken out with unique error codes) - not all have this line
        //case kRESULT_STATUS_User_CoverOpenErr:    //comment here  
        //  _stprintf( outDevStr, _T("2900 - (Cover Open) - %s(Upper cover open.)"), errorStr);
        //  outError    = HOP_COVER_OPEN;
        //  outSeverity = CCC_INFORMATION;
        //  break;

I tried changing the first part with the split to this, but it makes it return no results. I feel like if I just figure out how to not include a case block that is commented on the case line, it will fix everything.

$blocks = $parsedCaseMethod -split "(?=^[\s]+case (.*):)"; #didn't work - nothing in $result

Any help would be appreciated. Thanks! :)

This is with Powershell 5.1 and VS Code.

2
  • 1
    I thought this looked familiar. For this slightly changed problem statement I still maintain that running things through a C++ preprocessor to strip the comments would be an effective solution -- certainly more reliable than regexes. If an actual C++ preprocessor is just too much effort to install, a little comment stripping routine before we try to parse things with regexes would probably be simpler than integrating it into the existing regexes. Commented Mar 23, 2022 at 14:59
  • @JeroenMostert - thanks for the idea but I still need to keep with my current design. This is about the 6th device I'm getting to this point where I parse the data out of the cpp file, which is why the info being parsed is different than the link you showed. The idea to preprocess/remove the commented out lines was a good idea, but I knew if I could just figure out how to change my $blocks line to exclude the ones that didn't have only spaces before the case part, it would fix it, so I'm going with what mkelement0 answered below. Commented Mar 23, 2022 at 15:20

1 Answer 1

1

The simplest approach is probably to eliminate all comment lines in a first step, before splitting:

$blocks = $parsedCaseMethod -replace '(?m)^\s*//.*' -split '(?=case (.*):)'

Note:

  • To keep the regex simple, the above effectively replaces the comment lines with empty lines (it does, however, remove empty and all-whitespace lines preceding a comment line). If you want to avoid that, use -replace '(?m)^\s*//.*(?:\r?\n)?'

  • The assumption is that your C++ code doesn't contain multi-line comments (/* ... */), and no //-prefixed lines inside C++ 11 raw string literals.

Sign up to request clarification or add additional context in comments.

1 Comment

@Michele, (?m) is the inline form of the Multiline regex option, which makes ^ and $ match on the start and end of individual lines - please see the regex101.com link I've just added to the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.