5

I am trying to write a regular expression that will match a string that contains name-value pairs of the form:

<name> = <value>, <name> = <value>, ...

Where <value> is a C# string literal. I already know the s that I need to find via this regular expression. So far I have the following:

regex = new Regex(fieldName + @"\s*=\s*""(.*?)""");

This works well, but it of course fails to match in the case where the string I am trying to match contans a <value> with an escaped quote. I am struggling to work out how to solve this, I think I need a lookahead, but need a few pointers. As an example, I would like to be able to match the value of the 'difficult' named value below:

difficult = "\\\a\b\'\"\0\f \t\v", easy = "one"

I would appreciate a decent explanation with your answers, I want to learn, rather than copy ;-)

1
  • 1
    Heh ... perhaps I should look at the SO sourcecode, just noticed the syntax highlighter clearly undersands string literals with escaped quotes! Commented Feb 10, 2011 at 5:35

3 Answers 3

14

Try this to capture the key and value:

(\w+)\s*=\s*(@"(?:[^"]|"")*"|"(?:\\.|[^\\"])*")

As a bonus, it also works on verbatim strings.

C# Examples:https://dotnetfiddle.net/vQP4rn

Here's an annotated version:

string pattern = @"
(\w+)\s*=\s*    # key =
(               # Capturing group for the string
    @""               # verbatim string - match literal at-sign and a quote
    (?:
        [^""]|""""    # match a non-quote character, or two quotes
    )*                # zero times or more
    ""                #literal quote
|               #OR - regular string
    ""              # string literal - opening quote
    (?:
        \\.         # match an escaped character,
        |[^\\""]    # or a character that isn't a quote or a backslash
    )*              # a few times
    ""              # string literal - closing quote
)";
MatchCollection matches = Regex.Matches(s, pattern, 
                                        RegexOptions.IgnorePatternWhitespace);

Note that the regular string allows all characters to be escaped, unlike in C#, and allows newlines. It should be easy to correct if you need validation, but it should be file for parsing.

Sign up to request clarification or add additional context in comments.

Comments

3

This should match only the string literal part (you can tack on whatever else you want to the beginning/end):

Regex regex = new Regex("\"((\\.)|[^\\\\\"])*\"");

and if you want a pattern which doesn't allow "multi-line" string literals (as C# string literals really are):

Regex regex = new Regex("\"((\\[^\n\r])|[^\\\\\"\n\r])*\"");

Comments

-1

You can use this:

@"  \s* = \s* (?<!\\)""  (.* ) (?<!\\)"""

It's almost like yours, but instead of using "", I used (?<!\\)"" to match when suffix \ is not present, so it won't match escaped quotes.

2 Comments

What about a string like "c:\", (escaped to "c:\\") - your regex will not match the ending quote. Also, (?<!\\) does nothing before the first quote - if follows an equal sign or spaces.
Right, i didn't think about it ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.