0

I'm having trouble with a certain RegEx replacement string for later use in Javascript.

We have quite a bit of text that was stored in a rather odd format that we aren't allowed to fix. But we do need to find all the "network path" strings inside it, following these rules:

A. The matches always start with 2 backslashes.
B. The matching characters should stop as soon as it hits a first occurrence of any 1 of these:

  1. A < character
  2. A space
  3. A line feed
  4. A carriage return
  5. A & character
  6. A literal "\r" or "\n" string (but only if occurring at end of line)

We "almost" have it working with /\\\\[^ &<\s]*/gi as shown in this RegEx Tester page: https://regex101.com/r/T4cDOL/5

Even if we get it working, the RegEx has to be even futher "escape escaped" before putting on our Javascript code, but that's also not working as expected.

0

1 Answer 1

2

From your example, it seems you literally have a backslash followed by an n and a backslash followed by an r (as opposed to a newline or carriage return), which means you can't only use a negated character class (since you need to handle a sequence of two characters). I'd use a positive lookahead to know where to stop, so I can use an alternation for that part.

You haven't said what parts of those strings should match, so I've had to guess a bit, but here's my best guess (with useful input from Niet the Dark Absol):

const rex = /\\\\.*?(?=[ &<\r\n]|\\[rn](?:$| ))/gmi;

That says:

  • Match starting with \\
  • Take everything prior to the lookahead (non-greedy)
  • Lookahead: An alternation of:
    • A space, &, <, carriage return (\r, character 13), or a newline (\n, character 10); or
    • A backslash followed by r or n if that's either at the end of a line or followed by a space (so we get the \nancy but not the \n after it).

Updated regex101

You might want to have more characters than just a space after the \r/\n. If so, make it a character class (and/or use \s for "whitespace" if that applies):

const rex = /\\\\.*?(?=[ &<\r\n]|\\[rn](?:$|[ others]))/gmi;
// −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−^^^^^^^^^
Sign up to request clarification or add additional context in comments.

8 Comments

I added some more 'example text' where your idea ALMOST works, but not always: regex101.com/r/T4cDOL/5
I think you should also handle the case where the match goes to the end of the string/line.
@SusanSuzy - Please update your quesiton to put all of the information necessary to answer it in the question, not just linked. Be sure to show what should match. Two reasons: People shouldn't have to go off-site to help you; and links rot, making the question and its answers useless to people in the future. Please put a minimal reproducible example in the question. More: How do I ask a good question? and Something in my web site or project doesn't work. Can I just paste a link to it?
@NiettheDarkAbsol - Not a bad idea, I'll add \r\n to the character class.
regex101.com/r/T4cDOL/6 -- seems like some of the "paths" might have \n as a substring, so that token isn't always going to be a delimiter. The whole thing is awkward tbh XD
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.