2

I have a list of sentences to compare with another set of words in array. I am able to compare my array with list and get the matching sentences if it contains any of the word from array.

And also I am able to sort the list in descending order by getting the array of word count occurrences.

For example:

List<string> sourceList = new List<string>()
{
    "Realme smartphone has super Amoled screen with 4GB RAM capacity.",
    "Realme smartphone has LCD screen with 4GB RAM capacity.",
    "Realme phone has LCD screen with 6GB RAM capacity.",
    "Realme phone has LED screen with 6GB RAM capacity.",
    "Realme has smartphone with super Amoled screen with 4GB RAM and 4GB extended memory capacity",
    "Realme has LCD phone with 6GB RAM capacity."
};

searchStr = new string[3]{ "Realme", "phone", "LCD" };

Expected sorted list:

List<string> sortedList = new List<string>()
{
    "Realme phone has LCD screen with 6GB RAM capacity.",
    "Realme smartphone has LCD screen with 4GB RAM capacity.",
    "Realme has LCD phone with 6GB RAM capacity.",
    "Realme phone has LED screen with 6GB RAM capacity.",
    "Realme smartphone has super Amoled screen with 4GB RAM capacity.",
    "Realme has smartphone with super Amoled screen with 4GB RAM and 4GB extended memory capacity"
};

The reason for the expected output is:

  • First sentence contains all 3 exact words in “Realme”, “phone”, “LCD” in the same order.
  • Second sentence contains all 3 words in “Realme”, “phone”, “LCD” in the same order.(i.e., smartphone contains phone).
  • Third sentence contains all 3 words but not in exact order.
  • Fourth sentence contains 2 exact words in the same order.
  • Fifth sentence contains 2 words in order but not exact words.
  • Sixth sentence contains 2 words but the search term 'phone' occurs at the third position in the sentence.

The sort priority is:

  1. Number of word occurrences.
  2. Word order in the exact sequence.
  3. Word occurrence Position.
  4. Exact word match.
  5. Partial word match.

Also if the word occurs more than once in a sentnece, then that count should be considered for highest priority.

I've the code to get the count:

private List<string> GetMyList(List<string> strLst)
{
            
     List<string> rslLst = new List<string>();
     Dictionary<string, int> dctList = new Dictionary<string, int>();
     var wrdList = new string[3]{ "Realme", "phone", "LCD" };
     int wrdCount = wrdList.Count();
     
     foreach (string str in strLst)
     {
         int i = 0;
         foreach (string wrd in wrdList)
         {
             var x = str.ToString().Trim().ToLower().Contains(wrd.Trim().ToLower());
             if (x)
             {
                 i = i + CountWordUniqueOccurrences(str.ToLower(), wrd.ToLower());
             }
         }
         dctList.Add(str, i);
     }
     dctList = dctList.OrderByDescending(x => x.Value).ToDictionary(x => x.Key, x => x.Value);
     Dictionary<string, int>.KeyCollection keys = dctList.Keys;
     foreach (var key in keys)
     {
         rslLst.Add(key);
     }

     return rslLst;
            
}

private int CountWordUniqueOccurrences(string text, string pattern)
{
    int count = 0;
    if(text.Contains(pattern))
    {
        count++;
    }
    return count;
}

Can someone help me to identify the logic to achieve this.

7
  • You have 4 problems to solve: 1) counting the occurences of a word in a string, 2) finding the position of a word in a string, 3) finding the "Word order in the exact sequence." (whatever that means) and 4) sorting by the values in that order. Have you solved any of these? What are you stuck on? Commented Sep 21, 2021 at 13:53
  • @DStanley - I have solved counting the occurrences of a word in a string. Using Dictionary<string,int> I stored the sentence and word count. Commented Sep 21, 2021 at 13:55
  • IF it were me, I would write a function that takes a string and list of strings, look for each of the criteria, and output the expected sort order (1,2,3...N). Then just call that function from your Linq query (OrderBy(s => SortOrder(s, searchStr)) . You could even have that function call sub-functions for each of the sorting criteria (if (StringContainsWordsInOrder(s, searchStr))). My point is to break the problem into smaller sub-problems, then pull those together for a complete solution. Commented Sep 21, 2021 at 14:01
  • 1
    I think you have a 4th priority of matches for an entire word vs partial matches. So "phone" matching "I'm a phone" is higher in the sort than "I'm a smartphone". Commented Sep 21, 2021 at 14:02
  • @DStanley - I've updated with my code. Can you please help me to get the logic. Commented Sep 21, 2021 at 14:05

1 Answer 1

1

Here's someting to start with:

using System;
using System.Collections.Generic;
using System.Linq;

List<string> sourceList = new List<string>()
{
    "Realme smartphone has super Amoled screen with 4GB RAM capacity.",
    "Realme smartphone has LCD screen with 4GB RAM capacity.",
    "Realme phone has LCD screen with 6GB RAM capacity.",
    "Realme phone has LED screen with 6GB RAM capacity.",
    "Realme has smartphone with super Amoled screen with 4GB RAM and 4GB extended memory capacity",
    "Realme has LCD phone with 6GB RAM capacity."
};

var searchStr = new string[3]{ "Realme", "phone", "LCD" };

// We use Linq to order the list. The most important criteria comes first.
var result = sourceList
    .OrderByDescending(CountWords(searchStr))
    .ThenByDescending(CountOrderedWords(searchStr))
    .ThenByDescending(CountExactWords(searchStr));

// Counts how many of the search terms appear in the string
Func<string, int> CountWords(params string[] terms) => s => terms.Count(t => s.Contains(t));

// Counts how many words appear after each other
Func<string, int> CountOrderedWords(params string[] terms) => s => {
    var i = 0;
    var score = 0;
    foreach(var t in terms) {
        i = s.IndexOf(t, i);
        if (i < 0) break;
        score++;
    }
    
    return score;
};

// Counts how many words match exactly (you may want to update the word delimeters)
Func<string, int> CountExactWords(params string[] terms) =>
    s => terms.Count(t => s.Split(' ', '.', ',').Contains(t));

However the semantics of your task are not very clear to me and, additionally, this is no coding service. So I'll leave the rest to figure out by yoursely. You still can ask another question, if you run into new issues.

Here's the dotnet fiddle I played with, maybe you'll find it usefull: https://dotnetfiddle.net/NOBAgT

Sign up to request clarification or add additional context in comments.

2 Comments

This is giving me some random result set. not as expected.
What result do you get? It should not be random. As I wrote, I didn't include all of your rules. However I gladly help you to further improve it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.