0

I want to use regex in JavaScript to replace some words in a multi-line text. Origin:

Doka (1) is 20 years old. Doka, (2) Sole are my friends. Sole told me "Doka (3) is a nice gỉrl!"
Doka: (4) student of Hamma school.
I met Doka (5) yesterday.

Result of replacement as expected:

Bob (1) is 20 years old. Bob, (2) Sole are my friends. Sole told me "Doka (3) is a nice gỉrl!"
Doka: (4) student of Hamma school.
I met Bob (5) yesterday.

In this example, I would want to replace Doka (1, 2, 5) with Bob. The conditions to replace a match are:

  1. Not inside double quotes.
  2. Not between a carriage return and a colon.

How can I do that?

1 Answer 1

2

You could use this regular expression and code, provided that any double quotes are properly closed, i.e. they occur an even number of times:

var str = `Doka (1) is 20 years old. Doka, (2) Sole are my friends. Sole told me "Doka (3) is a nice gỉrl!"
Doka: (4) student of Hamma school.
I met Doka (5) yesterday.`;

str = str.replace(/(([^\n\r])Doka|Doka(?!:))(?=([^"]*"[^"]*")*[^"]*$)/g, '$2Bob');

console.log(str);

Explanation:

  • ([^\n\r])Doka: matches "Doka" and the character preceding it, provided that this preceding character is not a line-break character (i.e. neither a linefeed nor a carriage return). That character is captured in a group (parentheses), so we can restore it during the replacement.

  • |Doka(?!:): in case the above does not match this alternative will be tried. This happens when there is not any preceding character (i.e. "Doka" appears at the very start), or a line-break character precedes it. In this case we only allow a match when "Doka" is not followed by a colon.

The above two expressions are put in another set of parentheses to set the boundary of the OR (|) operation. This becomes the first capture group.

  • (?=([^"]*"[^"]*")*[^"]*$): this requires that a potential match is followed by an even number of quotes up to the very end ($) of the string. This comes down to requiring that the match is not wrapped in double quotes.

The replacement string $2Bob restores the second capture group (which is what is matched by [^\n\r], and could be nothing at all), and then inserts "Bob".

Sign up to request clarification or add additional context in comments.

3 Comments

@qxz, I have added some explanation.
Preceding character is not a dot. How to solve this?
You want to not replace when there is a dot before the word? That is exactly the same requirement as that there can be no newline before it, so just add that character to the list, like [^\n\r.]. Do you understand this notation? [^ .... ]? You can read about regular expressions here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.