7

I'm very close on this. I got a question posed to me yesterday by a developer if I could have a look at this.

I feel close, but I think some people here would appreciate the challenge too and I am lost.

If I have a List<string> which has the following members:

Today

Monday

Tuesday

Wednesday

I want to get a return string day because this is the largest common string in the List<string>. This should be done irrespective of position and string length, just want to find the largest length common string in a host of strings.

My attempt failed a bit miserably, I selected:

Monday - Tuesday

Monday - Wednesday

And then did an Intersect between each. Obviously this would return multiple strings, however for Monday - Wednesday you get nday because that is what letters it has common.

Here is my code:

  List<string> strs = new List<string>();
  strs.Add("Monday");
  strs.Add("Tuesday");
  strs.Add("Wednesday");

  var v = strs.SelectMany((day, i) => strs.Select((day2, j) => new
  {
    iDay = i,
    Day = day,
    iDay2 = j,
    Day2 = day2
  })).Where(x => x.iDay != x.iDay2).Select(x => new string(x.Day.Intersect(x.Day2).ToArray()));

Anybody have a nice and neat solution?

NOTE

It doesn't have to be LINQ

If there isn't a common string, return null or empty string.

6
  • In my view, a nice and neat solution will not involve huge lambda expressions. And will not necessarily involve Linq. Commented Nov 22, 2012 at 9:20
  • What's the outcome if there are no common strings? e.g. Bob, Fred, Max? Or Monday, Tuesday, Bill? i.e. there's not so much as a single character common to all items in the list? Commented Nov 22, 2012 at 9:24
  • @IlyaKogan Doesn't have to be LINQ, this was just my approach - tagged it because that's what I have used in the question. Commented Nov 22, 2012 at 9:24
  • @NeilMoss If there isn't a common string/set of common characters - then an empty string or null. Commented Nov 22, 2012 at 9:25
  • 1
    Might try google for "Common longest substring". There's plenty of algorithms. Pobably wouldn't use LINQ because of readability. Commented Nov 22, 2012 at 9:26

2 Answers 2

7

This works better than my first approach(striked out).

You can use following extension to get all substrings of the shortest string in the list(for efficiency):

public static IEnumerable<string> getAllSubstrings(this string word)
{
    return from charIndex1 in Enumerable.Range(0, word.Length)
           from charIndex2 in Enumerable.Range(0, word.Length - charIndex1 + 1)
           where charIndex2 > 0
           select word.Substring(charIndex1, charIndex2);
}
  • now order these substrings by Length(longest first)
  • look if all other strings(excluding the string itself because that test is redundant) contain that substring (Enumerable.All returns immediately if one string doesn't contain a given substring)
  • if one string appears in all others you have found the longest common substring
  • otherwise repeat that until you've checked all substrings(if no common string was found)

string shortest = list.OrderBy(s => s.Length).First();
IEnumerable<string> shortestSubstrings = shortest
    .getAllSubstrings()
    .OrderByDescending(s => s.Length);
var other = list.Where(s => s != shortest).ToArray();
string longestCommonIntersection = string.Empty;
foreach (string subStr in shortestSubstrings)
{
    bool allContains = other.All(s => s.Contains(subStr));
    if (allContains)
    {
        longestCommonIntersection = subStr;
        break;
    }
}

DEMO

Sign up to request clarification or add additional context in comments.

5 Comments

This only works for substrings at the end of the string (such as the example), so it's not "irrespective of position". Also, wouldn't it be more efficient to find the largest common substring of the first two, then between that and the third etc.?
@Rawling: oh yes, missed that completely. Why do you think that it would be more efficient? Enumerable.All just checks until one string does not contain that substring so it would be efficient if it would be correct.
@Rawling If you have something like Thinking, Sink, Linked, Inked you would expect Ink out - so your right. If Thinking was Think then the above solution would work, however in my case this is a little bit off what I wanted - very close however so thanks :)
@Tim I'm not so sure about the efficiency TBH.
Made it even more efficient(taking the first string is not optimal, better use the shortest string).
3

Find the shortest entry in the list.

  • Today
  • Monday
  • Tuesday
  • Wednesday

So we use "Today".

Build a list of strings of consecutive characters in "Today" of the length of the string down to each character, in "longest first" order.

"Today",

"Toda", "oday",

"Tod", "oda", "day",

"To", "od", "da", "ay",

"t", "o", "d", "a", "y"

Enumerate over this list, finding the first entry for which all the other strings contain that entry.

        List<string> words = new List<string> { "Today", "Monday", "Tuesday", "Wednesday" };

        // Select shortest word in the list
        string shortestWord = (from word in words
                            orderby word.Length
                            select word).First();

        int shortWordLength = shortestWord.Length;

        // Build up the list of consecutive character strings, in length order.
        List<string> parts = new List<string>();
        for (int partLength = shortWordLength; partLength > 0; partLength--)
        {
            for (int partStartIndex = 0; partStartIndex <= shortWordLength - partLength; partStartIndex++)
            {
                parts.Add(shortestWord.Substring(partStartIndex, partLength));
            }
        }
        // Find the first part which is in all the words.
        string longestSubString = (from part in parts where words.All(s => s.Contains(part)) select part).FirstOrDefault();

       // longestSubString is the longest part of all the words, or null if no matches are found.

EDIT

Thinking a little more about it, you can optimise a little.

You don't need to build up a list of parts - just test each part as it is generated. Also, by sorting the word list in length order, you always test against the shortest strings first to reject candidate parts more quickly.

        string longestSubString = null;

        List<string> words = new List<string> { "Todays", "Monday", "Tuesday" };

        // Sort word list by length
        List<string> wordsInLengthOrder = (from word in words
                                           orderby word.Length
                                           select word).ToList();

        string shortestWord = wordsInLengthOrder[0];
        int shortWordLength = shortestWord.Length;

        // Work through the consecutive character strings, in length order.
        for (int partLength = shortWordLength; (partLength > 0) && (longestSubString == null); partLength--)
        {
            for (int partStartIndex = 0; partStartIndex <= shortWordLength - partLength; partStartIndex++)
            {
                string part = shortestWord.Substring(partStartIndex, partLength);

                // Test if all the words in the sorted list contain the part.
                if (wordsInLengthOrder.All(s => s.Contains(part)))
                {
                    longestSubString = part;
                    break;
                }
            }

        }

        Console.WriteLine(longestSubString);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.