1

I working on making a Regex pattern where I can extract strings starting with ' " ' and ending with ' " '. But here is the problem - a String may also contains a ' " ' with escape character like this ' \" '. Just like this one "This is a \"Demo\" text". Now I know very little about lookbehind operator. I just need some suggestion if this is possible with a single Regex Pattern ?

Thanks

2
  • 1
    Are you using Java to parse Java ? If not, what regex engine are you using ? Here's a start (?<!\\)"(?:[^\\]|\\.)*?". Also what have you tried ? Commented Mar 11, 2014 at 7:48
  • You should have a look here Commented Mar 11, 2014 at 8:54

1 Answer 1

4

It should work like this:

"(?:\\.|[^"])+"

without lookahead/behind stuff. This does the following:

  1. Look for a ", consume it
  2. Check if the next 2 characters are a backslash followed by any character (this will match two backslashes \\, where the first is masking the second, and \" as well). If that can not be found, go to Step 3. If found, consume those 2 characters and repeat Step 2.
  3. Check if the next character is not a ". If so, consume and go to step 2. If not (it IS a "), go to Step 4
  4. Consume the " which must be here

As HamZa pointed out, this Regex will fail if a " is found outside of a string and not intended to be a start of a string. E.g. for Java Code this is the case if you have something like

Character c = '\"'

(" as a char) or

if (foo) { /* chosen "sometimes */ String g = "bar"; }

(random " inside a comment)

Sign up to request clarification or add additional context in comments.

6 Comments

Aww. Too late. Could someone explain to me if lookbehind (like in HamZa's comment) is necessary here or if my solution works, too?
This would fail for "This a \\\\" this shouldn't be matched \". Ok, it's an edge case, I know :)
I just fixed that issue since it came to my mind just as i posted it ;)
@TheM You need the lookbehind since it will fail for do \" not match this but "match this". Anyways, +1
Ah, okay, now I understand why that's needed, I thought of the text to be java (or some other) code, which wouldn't allow a \" to occur somewhere outside a string (except for '\"' of course, but that's a real edge case ;) ) Thank you for pointing that out!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.