I'm not sure what the criteria for determining matches is, so I've written this - it's completely novel, it's a 99.9999% certainty that it's not actually what you want.
Essentially, the outer select gets all the substrings of the determined length.
The first inner select determines the maximum length of this string that was found in at least one other string in the list.
The group by (following the first inner select) groups the found lengths by themselves.
This grouping is then converted to a dictionary of the length versus the number of times it was found.
We then order that set of groupings by frequency (Value) that the length was found (ascending).
Next, we take that actual length (the least frequently occurring length - from Key) and spit it back out into the second parameter of Substring so we take the substrings from 0 to that length. Of course, we're back in the outer select now, so we're actually getting values (hooray!).
Now, we take the distinct set of values from that result and voila!
list.Select(
item => item.Substring(0,
list.Select(
innerItem => Enumerable.Range(1, innerItem.Length)
.Reverse()
.First(proposedlength => list.Count(innerInnerItem => innerInnerItem.StartsWith(innerItem.Substring(0, proposedlength))) > 1)
)
.GroupBy(length => length)
.ToDictionary(grouping => grouping.Key, grouping => grouping.Count())
.OrderBy(pair => pair.Value)
.Select(pair => pair.Key)
.First())
).Distinct()
After reading the comments above, I see that there's also an interest in finding the distinct longest substrings present in any of the others for each term. Here's more novel code for that:
list.Select((item, index) => new {
Index=index,
Length=Enumerable.Range(1, item.Length)
.Reverse()
.First(proposedLength => list.Count(innerItem => innerItem.StartsWith(item.Substring(0, proposedLength))) > 1)
}).Select(n => list[n.Index].Substring(0, n.Length))
.Distinct()
In short, iterate through each item in the list and collect the index of the entry and the longest substring from the beginning of that element that may be found in at least one other entry in the list. Follow that by collecting all the substrings from each Index/Length pair and taking only the distinct set of strings.