0

I need to match incorrect backslashes in a text. The following text is an example:

\.br\ Random Words \.br\\1 Testing\.br\2\ Check

So the \.br\ are correct, however the backslashes in \1 and 2\ are not.

So I attempted a regular expression to match any \ which is not followed by a .br but that failed because it would match the closing \ in \.br\

I then looked up a few similar questions on stackoverflow and most of them stated that a series of lookaheads can be used as an 'and' operator and so I tried this:

/(?!\\\.br)\\(?!\.br\\)/

What I attempted to do, was match any backslash that was neither precedeed by a \.br nor followed by a .br\ but it didn't seem to work.

Any help would be appreciated. I hope I haven't missed out any details in the question.

Thanks,

Sid

2
  • Are you using perl or javascript regex engine? You could use negative lookbehind assertion to accomplish this if you are using pearl. Hovever javascript regex does not support negative lookbehind. Commented Feb 4, 2014 at 15:33
  • I actually had to do this on both engines. One of my codes was a perl script and the other was javascript. I reckon ikegami's workaround could solve the issue. But unsure how it would work if there were other valid escapes in the mix like \F\ \S\ \T\ Commented Feb 4, 2014 at 15:55

3 Answers 3

6

Close. (?!PAT) means "not followed by PAT". You want "not preceded by PAT".

(?<!\\\.br)\\(?!\.br\\)

The following will be a bit faster:

\\(?<!\\\.br\\)(?!\.br\\)
Sign up to request clarification or add additional context in comments.

6 Comments

Assuming OP is using perl regex, this would work great, however this would not work with javascript regex engine.
@MElliott, There are many ways around that. Exactly what depends on the bigger picture. For example, removal of "bad" backslashes: s/\\(\.br\\)?/ $1 ? "\\.br\\" : "" /eg
EDIT 2: What if \.br\ is not the only valid use of backslashes? What if \F\ \S\ and a few others work too. Would your javascript workaround still work? Thanks
s/\\((?:\.br|F|S)\\)?/ $1 ? "\\$1" : "" /eg
Again, thanks a lot Sir. Would it be at all possible to give me a very brief explanation of what the code is doing? This format of replacement seems new to me.
|
2

I would use perl, and with a \G anchor and a \K meta character (and some atomic/possessive parts to improve efficiency):

\G(?>\\\.br\\|[^\\]++)*+\K\\

It should be faster than using lookarounds, since there's no duplication of matches (going over the same substring more than once, which is what lookarounds do).

regex101 demo.

Matches completed with 24 and 21 steps respectively (as opposed to using lookarounds using 36 and 22 steps, plus 4 failing steps).

2 Comments

That doesn't just match the \ as requested. That means it can't be inserted into a larger pattern, but that may be acceptable. I definitely use \K whenever possible.
The effeciency bit is something I fail to take into consideration so thanks for pointing that out. I always used regex101 and did not realize that it showed the steps it would take. Thanks for the help, I would want to upvote this too but my reputation won't let me.
0
(?:\\(?!\.br)\\)+(\S+)

The regex above will capture those characters inside backslashes that are not .br.

*Please note that the number 2 in \.br\2\ will not be captured as .br\ is correctly typed.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.