1

I looked at several examples on how to use regex in JS but I can not seem to find the right syntax for what I need. Basically I have an array of words:

commonWords=["she", "he", "him", "liked", "i", "a", "an", "are"]

and a string:

'She met him where he liked to eat "the best" cheese pizza.'

Basically I want to use non-alphas and my array of commonWords as delimiters for extracting phrases. The above would yield somthing like this:

'met, where, to eat, the best, cheese pizza'
1
  • The response should be: 'met, where, to eat, the best, cheese pizza'. "liked" is in the commonWords list. Commented Jun 29, 2010 at 10:59

3 Answers 3

1

From the OP:

"Basically I want to use non-alphas and my array of commonWords as delimiters for extracting phrases."

This does both (unlike some other answers ;-) ). It returns either a string or an array.

var commonWords = ["she", "he", "him", "liked", "i", "a", "an", "are"];
var SourceStr   = 'She met him where he liked to eat "the best" cheese pizza, didn\'t she, $%&#! Mr. O\'Leary?';

//--- Kill (most) non-alphas, and the keywords replace with tab.
var zRegEx      = eval ('/([^0-9a-z\' ]+)|\\s*\\b(' + commonWords.join ("|") + ')\\b\\s*/ig');
var sPhraseList = SourceStr.replace (zRegEx, '\t');

//-- Trim empty results and leading and trailing delimiters.
sPhraseList     = sPhraseList.replace (/ *\t+ */g, ', '). replace (/, ?, ?/g, ', ');
sPhraseList     = sPhraseList.replace (/(^[, ]+)|([, ]+$)/g, '');

//-- Make optional array:
aPhraseList     = sPhraseList.split (/, */g);

//-- Replace "console.log" with "alert" if you're not using Firebug.
console.log (SourceStr);
console.log (sPhraseList);
console.log (aPhraseList);

.
This returns:

"met, where, to eat, the best, cheese pizza, didn't, Mr, O'Leary"

and

["met", "where", "to eat", "the best", "cheese pizza", "didn't", "Mr", "O'Leary"]
Sign up to request clarification or add additional context in comments.

Comments

1

Are you looking for something like this:

var commonWords=["she", "he", "him", "liked", "i", "a", "an", "are"];
var regstr = "\\b(" + commonWords.join("|") + ")\\b";
//regex is \b(she|he|him|liked|i|a|an|are)\b
var regex = new RegExp(regstr, "ig");
var str = 'She met him where he liked to eat "the best" cheese pizza.';
console.log(str.replace(regex, ""));

output

 met where to eat "the best" cheese pizza.

split version:

var commonWords=["she", "he", "him", "liked", "i", "a", "an", "are"];
var regstr = "\\b(?:" + commonWords.join("|") + ")\\b";
var regex = new RegExp(regstr, "ig");
var str = 'She met him where he liked to eat "the best" cheese pizza.';
var arr = str.split(regex);
console.log(arr);// ["", " met ", " where ", " ", " to eat "the best" cheese pizza."]

for(var i = 0; i < arr.length; i++)
  if(arr[i].match(/^\s*$/)) //remove empty strings and strings with only spaces.
    arr.splice(i--, 1);
  else
    arr[i] = arr[i].replace(/^\s+|\s+$/g, ""); //trim spaces from beginning and end

console.log(arr);// ["met", "where", "to eat "the best" cheese pizza."]
console.log(arr.join(", "));// met, where, to eat "the best" cheese pizza.

1 Comment

Nice one. The OP wants split instead of replace, but it's similar enough. (ie, remove capturing group, and maybe empty tokens)
0

This version is quite verbose, but works with “lazy” single and double quotes as well:

If array contains object (like indexOfObject) with a case-insensitive comparison flag:

if (!Array.prototype.containsObject) Array.prototype.containsObject = function (object, caseInsensitive) {

    for (var i = 0; i < this.length; i++) {

        if (this[i] == object) return true;

        if (!(caseInsensitive && (typeof this[i] == 'string') && (typeof object == 'string'))) continue;

        return (this[i].match(RegExp(object, "i")) != null);

    }

    return false;

}

Push object to the array if not empty:

if (!Array.prototype.pushIfNotEmpty) Array.prototype.pushIfNotEmpty = function (object) {

    if (typeof object == 'undefined') return;
    if ((object && object.length) <= 0) return;

    this.push(object);

}

Canonicalizing strings:

function canonicalizeString (inString, whitespaceSpecifier) {

    if (typeof inString != 'string') return '';
    if (typeof whitespaceSpecifier != 'string') return '';

    var whitespaceReplacement = whitespaceSpecifier + whitespaceSpecifier;
    var canonicalString = inString.replace(whitespaceSpecifier, whitespaceReplacement);

    var singleQuotedTokens = canonicalString.match(/'([^'s][^']*)'/ig);
    for (tokenIndex in singleQuotedTokens) canonicalString = canonicalString.replace(singleQuotedTokens[tokenIndex], String(singleQuotedTokens[tokenIndex]).replace(" ", whitespaceReplacement));

    var doubleQuotedTokens = canonicalString.match(/"([^"]*)"/ig);
    for (tokenIndex in doubleQuotedTokens) canonicalString = canonicalString.replace(doubleQuotedTokens[tokenIndex], String(doubleQuotedTokens[tokenIndex]).replace(" ", whitespaceReplacement));

    return canonicalString;

}

Have fun:

function getSignificantTokensFromStringWithCommonWords (inString, inCommonWordsArray) {

    if (typeof inString != 'string') return [];
    if (typeof (inCommonWordsArray && inCommonWordsArray.length) != 'number') return [];

    var canonicalString = canonicalizeString(inString, "_");

    var commonWords = [];
    for (indexOfCommonWord in inCommonWordsArray) commonWords.pushIfNotEmpty(canonicalizeString(inCommonWordsArray[indexOfCommonWord], "_"));

    var tokenizedStrings = canonicalString.split(" ");

    for (indexOfToken in tokenizedStrings)
    if (commonWords.containsObject(tokenizedStrings[indexOfToken], true))
    tokenizedStrings[indexOfToken] = undefined;





    var responseObject = [];
    for (indexOfToken in tokenizedStrings)
    if (typeof tokenizedStrings[indexOfToken] == 'string')
    responseObject.push(tokenizedStrings[indexOfToken]);

    for (indexOfTokenInResponse in responseObject)
    if (typeof responseObject[indexOfTokenInResponse] == 'string')
    responseObject[indexOfTokenInResponse] = String(responseObject[indexOfTokenInResponse]).replace("__", " ");

    return responseObject;

}

1 Comment

You’ll call getSignificantTokensFromStringWithCommonWords(inString, inCommonWordsArray) to work with this snippet.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.