1

I have a list of expression in an array that I need to look for and highlight in a list of sentences and based on what I find return some stats. Here is a simplistic example of how this work.

listOfExpressions = new Array();
listOfSentences = new Array();
listOfSentences.push("I will not do my own bed very early");
listOfSentences.push("I will eat my lunch around 12:00");

listOfExpressions.push(["will","verb","positive"];
listOfExpressions.push(["will not","verb","negative"]);
listOfExpressions.push(["bed","noun","common_object"]);
listOfExpressions.push(["very","adverb",""]);
listOfExpressions.push(["my","possessive,"singular"]);

I need to highlight for each sentence in listOfSentences the expressions of listOfExpressions that I have found plus return some extra statistics such as the number of possessives directly followed by a noun (in the first sentence that will be 0, and in the second 1), and display this for every sentence.

My initial idea was to split the sentences by word with something like .replace(/([.?!])\s*(?=[A-Z])/g, "$1|").split("|") but that would fail with the "will not" example... Of course one could imagine splitting up the listOfExpressions but that's not something that is possible in the project. I gave a very easy example here but the things I am looking for are more complicated and the listOfExpressions should be seen as immutable. Another issue with splitting each sentence in an array of words is that it would also detect the "I will not" as a "will" occurence and not as a will not.

So that led me to simply use the indexOf() with each item of the listOfExpressions. The problem with that now, is that I need, for the additional stats I mentioned, to also know about the preceding and following words to be able to correctly count the number of possessives directly followed by a noun in the sentence. I guess a nice way to call this problem is to say that it's a context-dependent substring finding.

So it seems that I'm stuck there. I am not sure on how to proceed from here on... I think it might be an easy problem (and solution) and I might be missing something obvious, so I figured some external look and advice/ideas would help. It's a bit of a language-agnostic and algorithmic problem but I'd love to get some advice on this. Javascript would be more welcome as it's the language I'm currently using for that project (highlighting text in JS is easy, I think).

Thanks in advance,

Cheers

1 Answer 1

1

Am not sure this answers your entire question, but think it will help...

In dealing with computer grammars, I've found that where there is potential for ambiguity, it is best to search for tokens ordered with super sets first. For example, using your data, "will not" is a super set of "will", and therefore, "will not" should be sought prior to "will" as you have already surmised.

Thus, once you have built the listOfExpressions, they can be ordered as follows...

listOfExpressions.sort( (a,b) => { return b[0].localeCompare(a[0]) } )

To increase the complexity of the example, am adding the following to your list above...

listOfExpressions.push(["will not run","x","x"]);
listOfExpressions.push(["will be home","x","x"]);
listOfExpressions.push(["will be back","x","x"]);
listOfExpressions.push(["will be","x","x"]);
listOfExpressions.push(["will not be","x","x"]);

...and then sorting per above, the example results in the following...

0: (3) ["will not run", "x", "x"]
1: (3) ["will not be", "x", "x"]
2: (3) ["will not", "verb", "negative"]
3: (3) ["will be home", "x", "x"]
4: (3) ["will be back", "x", "x"]
5: (3) ["will be", "x", "x"]
6: (3) ["will", "verb", "positive"]
7: (3) ["very", "adverb", ""]
8: (3) ["my", "possessive", "singular"]
9: (3) ["bed", "noun", "common_object"]

...such that now if you search the listOfSentences using this ordered listOfExpressions and indexOf() as you indicated, the routine will seek the super set phrases first, thereby eliminating the ambiguity of the matches...

Hope this helps.

Sign up to request clarification or add additional context in comments.

1 Comment

That's quite smart and it did help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.