2

Can we format a std::regex string with whitespace/linebreak which get ignored - just for better reading? Is there any option available like in Python VERBOSE)?

Without verbose:

charref = re.compile("&#(0[0-7]+"
                     "|[0-9]+"
                     "|x[0-9a-fA-F]+);")

With verbose:

charref = re.compile(r"""
 &[#]                # Start of a numeric entity reference
 (
     0[0-7]+         # Octal form
   | [0-9]+          # Decimal form
   | x[0-9a-fA-F]+   # Hexadecimal form
 )
 ;                   # Trailing semicolon
""", re.VERBOSE)
2
  • I don't think so. You could use a raw string literal and pass it to another function that strips out its whitespace and then compiles it into a regex, but you'd have to write that stripping function yourself. Commented Jun 10, 2016 at 14:16
  • 1
    You can split the string literal into multiple lines, like you show in your first example. You can have comments on those lines. Commented Jun 10, 2016 at 14:30

2 Answers 2

8

Simply split the string into multiple literals and use C++ comments like so:

std::regex rgx( 
   "&[#]"                // Start of a numeric entity reference
   "("
     "0[0-7]+"           // Octal form
     "|[0-9]+"           // Decimal form
     "|x[0-9a-fA-F]+"    // Hexadecimal form
   ")"
   ";"                   // Trailing semicolon
);

They will then be combined to "&[#](0[0-7]+|[0-9]+|x[0-9a-fA-F]+);" by the compiler. This will also allow you to add whitespaces to the regex which won't be ignored. However the additional quotation marks can make this a little bit laborious to write.

Sign up to request clarification or add additional context in comments.

Comments

5
inline std::string remove_ws(std::string in) {
  in.erase(std::remove_if(in.begin(), in.end(), std::isspace), in.end());
  return in;
}

inline std::string operator""_nows(const char* str, std::size_t length) {
  return remove_ws({str, str+length});
}

now, this doesn't support # comments, but adding that should be easy. Simply create a function that strips them from a string, and do this:

std::string remove_comments(std::string const& s)
{
  std::regex comment_re("#[^\n]*\n");
  return std::regex_replace(s, comment_re, "");
}
// above remove_comments not tested, but you get the idea

std::string operator""_verbose(const char* str, std::size_t length) {
  return remove_ws( remove_comments( {str, str+length} ) );
}

Once finished, we get:

charref = re.compile(R"---(
 &[#]                # Start of a numeric entity reference
 (
     0[0-7]+         # Octal form
   | [0-9]+          # Decimal form
   | x[0-9a-fA-F]+   # Hexadecimal form
 )
 ;                   # Trailing semicolon
)---"_verbose);

and done.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.