3

This is my scenario!

List<String> list = new List<String>();
list.Add("E9215001");
list.Add("E9215045");
list.Add("E1115001");
list.Add("E1115022");
list.Add("E1115003");
list.Add("E2115041");
list.Add("E2115042");
list.Add("E4115021");
list.Add("E5115062");

I need to extract the following common parts from the above list using C# & LINQ

E92150 -> Extracted From {*E92150*01, *E92150*45}

E11150 -> Extracted From {*E11150*01, *E11150*22, *E11150*03}

E21150 -> Extracted From {*E21150*41, *E21150*42}

E41150 -> Extracted From {*E41150*21}

E51150 -> Extracted From {*E51150*62}

UPDATE: Thank you! everyone! with the help of @mlorbetske & @shelleybutterfly I've figured it out!

Solution:

list.Select((item, index) => new {
Index=index, 
Length=Enumerable.Range(1, (item.Length-2)) //I'm ignoring the last 2 characters
                 .Reverse()
                 .First(proposedLength => list.Count(innerItem =>  
                   innerItem.StartsWith(item.Substring(0, proposedLength))) > 
                   1)}).Select(n => list[n.Index].Substring(0, n.Length)).Distinct()
17
  • 3
    Are the strings always the same length? Commented Dec 22, 2012 at 10:39
  • 2
    would "E1115001" and "E1115003" be considered common as "E111500" etc, or only if all elements start with a common value ? Commented Dec 22, 2012 at 10:50
  • No they are not always same. That first 6 chars is not always constant!, they might also vary.. I've edited my Question. Check now! thanks in advance! Commented Dec 22, 2012 at 10:59
  • 1
    Okay, so does that mean you need to extract something different RE: sa_ddam213's question above; e.g. so with the list of stuff you have above will you need to extract E11150 and E111500 since both of those are repeated? [oops E11150 isn't repeated as far as I see actually] Commented Dec 22, 2012 at 11:05
  • 2
    Okay; the other way I see this possibly working would be to extract the longest common string (harder problem) which gives me the list: "E92150", "E111500", "E211504", and "E". If that's what you need I will take a look. Commented Dec 22, 2012 at 11:13

4 Answers 4

5

I doubt that this is what you're looking for, however

var result = list.Select(s => s.Substring(0, 6))
                 .Distinct();
Sign up to request clarification or add additional context in comments.

3 Comments

Identical to my answer.. discarded!
Also, for strings s of lengths exceeding six, s.Remove(6) is equivalent to s.Substring(0, 6).
Thank you! But That first 6 chars is not always constant!, they might also vary.. I've edited my Question. Check now! thanks in advance!
1

I'm not sure what the criteria for determining matches is, so I've written this - it's completely novel, it's a 99.9999% certainty that it's not actually what you want.

Essentially, the outer select gets all the substrings of the determined length.

The first inner select determines the maximum length of this string that was found in at least one other string in the list.

The group by (following the first inner select) groups the found lengths by themselves.

This grouping is then converted to a dictionary of the length versus the number of times it was found.

We then order that set of groupings by frequency (Value) that the length was found (ascending).

Next, we take that actual length (the least frequently occurring length - from Key) and spit it back out into the second parameter of Substring so we take the substrings from 0 to that length. Of course, we're back in the outer select now, so we're actually getting values (hooray!).

Now, we take the distinct set of values from that result and voila!

list.Select(
    item => item.Substring(0, 
        list.Select(
            innerItem => Enumerable.Range(1, innerItem.Length)
                           .Reverse()
                           .First(proposedlength => list.Count(innerInnerItem => innerInnerItem.StartsWith(innerItem.Substring(0, proposedlength))) > 1)
                   )
            .GroupBy(length => length)
            .ToDictionary(grouping => grouping.Key, grouping => grouping.Count())
            .OrderBy(pair => pair.Value)
            .Select(pair => pair.Key)
            .First())
        ).Distinct()

After reading the comments above, I see that there's also an interest in finding the distinct longest substrings present in any of the others for each term. Here's more novel code for that:

list.Select((item, index) => new {
    Index=index, 
    Length=Enumerable.Range(1, item.Length)
                     .Reverse()
                     .First(proposedLength => list.Count(innerItem => innerItem.StartsWith(item.Substring(0, proposedLength))) > 1)
}).Select(n => list[n.Index].Substring(0, n.Length))
  .Distinct()

In short, iterate through each item in the list and collect the index of the entry and the longest substring from the beginning of that element that may be found in at least one other entry in the list. Follow that by collecting all the substrings from each Index/Length pair and taking only the distinct set of strings.

6 Comments

The 2nd Solution is Excellent!!!! Thank You!!!!! Thank You!!! I just achieved it by modifying Length = Enumerable.Range(1, (item.Length - 2)) also I thank @shelleybutterfly for your effort..
I agree about the genius part. :) hey @mlorbetske; I tried generating some random strings of numbers to test out my solution (not yet posted) vs. yours, and it seems to choke; does this assume all the strings are the same length?
Hmm, I set all the strings to the same length, and it still is failing, does it assume there will at least be some matches, maybe?
@shelleybutterfly I use dis 2 easeup attendance marking process 4 d faculty. usly in a class! Rollno is of same length. 1ly d last 2 digits difrs 4 evry stud. If a comma is prssd aftr typing a Rollno, I automaticly load d most commonpart next. so staff can type d remaing 2 digits instd of typing d whole rollno! However in some cases, a class may contain studs whom r transferred frm other classes. In that case i need 2 find d multiple ComnParts. Now tanks 2 u guys, im able to load d 1st most CommonPart aftr a comma press, if ALT+Comma is prssd i load the next MostCommonPart & i cycle it.
@shelleybutterfly there 3 assumptions made 1-None of the strings are null 2-All of the strings have at least one letter in common 3-(Depending which one you used) all the strings are at least of the length of the common substring length
|
1

Does it need to be inline query syntax? If so, how about:

var result =
    from item in list
    select item.Substring(0,6);

or with the Distinct requirement:

var result =
    (
        from item in list
        select item.Substring(0,6);
    )
    .Distinct();

1 Comment

Thank you! But that first 6 chars is also not constant!, they might also vary.. I've edited my Question. Check now! thanks in advance!
0

SOLVED! Thanks to @mlorbetske and @shelleybutterfly

list.Select((item, index) => new { Index=index, 
            Length=Enumerable.Range(1, (item.Length-2)) //I don't need the last 2 Char so I'm ignoring it
            .Reverse()
            .First(proposedLength => list.Count(innerItem =>  
             innerItem.StartsWith(item.Substring(0, proposedLength))) > 
             1)}).Select(n => list[n.Index].Substring(0, n.Length)).Distinct()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.