I have a list of expression in an array that I need to look for and highlight in a list of sentences and based on what I find return some stats. Here is a simplistic example of how this work.
listOfExpressions = new Array();
listOfSentences = new Array();
listOfSentences.push("I will not do my own bed very early");
listOfSentences.push("I will eat my lunch around 12:00");
listOfExpressions.push(["will","verb","positive"];
listOfExpressions.push(["will not","verb","negative"]);
listOfExpressions.push(["bed","noun","common_object"]);
listOfExpressions.push(["very","adverb",""]);
listOfExpressions.push(["my","possessive,"singular"]);
I need to highlight for each sentence in listOfSentences the expressions of listOfExpressions that I have found plus return some extra statistics such as the number of possessives directly followed by a noun (in the first sentence that will be 0, and in the second 1), and display this for every sentence.
My initial idea was to split the sentences by word with something like .replace(/([.?!])\s*(?=[A-Z])/g, "$1|").split("|") but that would fail with the "will not" example... Of course one could imagine splitting up the listOfExpressions but that's not something that is possible in the project. I gave a very easy example here but the things I am looking for are more complicated and the listOfExpressions should be seen as immutable. Another issue with splitting each sentence in an array of words is that it would also detect the "I will not" as a "will" occurence and not as a will not.
So that led me to simply use the indexOf() with each item of the listOfExpressions. The problem with that now, is that I need, for the additional stats I mentioned, to also know about the preceding and following words to be able to correctly count the number of possessives directly followed by a noun in the sentence. I guess a nice way to call this problem is to say that it's a context-dependent substring finding.
So it seems that I'm stuck there. I am not sure on how to proceed from here on... I think it might be an easy problem (and solution) and I might be missing something obvious, so I figured some external look and advice/ideas would help. It's a bit of a language-agnostic and algorithmic problem but I'd love to get some advice on this. Javascript would be more welcome as it's the language I'm currently using for that project (highlighting text in JS is easy, I think).
Thanks in advance,
Cheers