2

I have the need to write a RegEx to use in my javascript so I can match a set of three consecutive word. This three word will be variable know as "before", "error", "after". The thing is "error" is always there but as it can be the anchor at start or end of the sentence, "before" or "after" can be missing. So to illustrate :

If before= "this" after = "that" error="fail"

In the sentence : test = "this fail that, but fail is not part of the result but can be in the case it is like this, fail"

The result will be :

this fail that
this fail

only 2 of them are correctly return as they have the "error" word and at least one of the two side word. They can be symboles between the word as I don't get the punctuation.

I'm trying to learn RegEx but so far I only manage to retreive the error word with something like : new RegExp("\\b" + motErreur + "\\b", "gi");

And the try I did for the three word do not seems to work correctly :

pattern = @"(?:^\W*|(?<"+before+">\w+)\W+)" + error + @"(?:\W+(?<"+after+">\w+)|\W*$)";

As pattern if taken from an exemple in C# in my code and need it in Javascript I don't know if it is what make him fail.

How can I do this with a simple RegEx ? the purpose is then to replace the part of the sentence return (I already got the function written for that, I only fail with this RegEx).

2
  • You are using named capture groups, they are not supported by JS. You are looking for (?:^\W*|(\w+)\W+)fail(?:\W+(\w+)|\W*$), right? Commented Jan 19, 2016 at 12:44
  • @WiktorStribiżew such a shame. But it explain why It did not work. In the case of your RegEx where are the place for before or after ? Because without them I will get some match that i don't want. Commented Jan 19, 2016 at 12:45

3 Answers 3

1

Since you are using the pattern in JS, you need to use a constructor notation and use numbered capture groups rather than (?<name>....) named ones:

var before= "this", after = "that", error="fail";

var re = RegExp("(?:^\\W*|(" + before.replace(/[.*+?^${}()|[\]\\]/g, "\\$&") + ")\\W+)" + error.replace(/[.*+?^${}()|[\]\\]/g, "\\$&") + "(?:\\W+(" + after.replace(/[.*+?^${}()|[\]\\]/g, "\\$&") + ")|\\W*$)", "g"); 
var str = 'this fail that, but fail is not part of the result but can be in the case it is like this fail';
var m;
while ((m = re.exec(str)) !== null) {
    document.body.innerHTML += m[0] + "<br/>";
}

Note:

  • Since you are building the pattern dynamically, you need to use a constructor notation (RegExp(...))
  • In constructor notation, \ must be doubled
  • As key can contain special regex metacharacters they must be escaped to be treated as literals (I added .replace(/[.*+?^${}()|[\]\\]/g, "\\$&"))
  • The context words are in m[1] and m[2]. Check if they are not undefined before use.
Sign up to request clarification or add additional context in comments.

12 Comments

The problèm with your answer is that fail is ommited and result like "but fail is" should not be taken as a good result becaseu there is no side words around him but just two other word.
Ok, fail is there, just display the m[0], but I do not get the issue with but fail is. Do you mean that all the context words are set? Must be pased as arguments? Please check this updated snippet.
The snipet seems really good. I'm going to give It a try with some case and see the result
The .replace(/[.*+?^${}()|[\]\\]/g, "\\$&") is necessary if before has a special regex metacharacter like . or +, etc. Those characters must be escaped (a \ should be inserted before those symbols so that they are treated as literal symbols).
Okay so this is totaly what I needed thanks, i got some few bug but it's from the JS and I will be able to correct them myself.
|
1

If understood the question correctly, try (this\s+fail\s+that|this\s+fail|fail\s+that).

3 Comments

in my case they are variable, before, error, after. I'm going to try it and see. They can have symboles between the word by the way
This will match fail that anywhere in the string, whereas in the absence of this it's supposed to match only at the beginning of the string, if I understand correctly.
I didn't fully understand the question. If so, use (this\W+fail\W+that|this\W+fail$|^fail\W+that) if any characters except A-Z, a-z, 0-9 or _ are considered symbols.
1

Using a regexp and the exec method to find each match:

var rgx = new RegExp("(" + before + "\\\s*" + error + "\\\s*(" + after + ")*)", "g")
var resultArray = rgx.exec(test);

The matching item in resultArray is the one at index 1 (eg. resultArray[1]). Call exec method while resultArray is not null to find all matching items.

So you could write a function:

function getMatches(str)
{
    var before= "this";
    var after = "that";
    var error = "fail";
    var array = new Array();
    var rgx = new RegExp("(" + before + "\\\s*" + error + "\\\s*(" + after + ")*)", "g")
    var matches = rgx.exec(test);
    while(matches != null)
    {
        array.push(matches[1]);
        matches = rgx.exec(test);
    }

    return array;
}

5 Comments

this will not work for the case "this, fail" as the after anchor is not there because fail is the end of the sentence.
This will not work if before, or after or error contain special regex metacharacters. Also, \\s should be used, not \\\s in the constructor notation.
If you need all punctuation marks, you can replace \\\s* by [\\\s,.;:!?]* Also, the 3 slashed s works properly while your 2 slashed s don't: try it.
@ADreNaLiNe-DJ: I do not have to try, I always use double slashes to introduce a literal escape symbol in C strings in JS. No need to use triple slashes.
In your example you are in a "simple string". Here we are dealing with s tring representing a regexp and double slashed is not enough to make it work properly. If you don't try, you'll never see. I tried both (2 and 3 slashes) and the result is different. Only 3 slashes worked.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.