1

I have a List<List<string>> in C# - the number of items in the parent list will vary - there could be 1 or there could be 5. I need to know if there are any duplicates when considering all values in the same position in all lists.

This is vary similar to a database unique constraint on a composite key where you cant have duplicates. Each List contains all of the values from the data in the table.

For example if I have the following structure (but each one can have just 1 column or more):

Product    Color    Size
Tshirt     Blue     S
Tshirt     Blue     M
Tshirt     Blue     L
Tshirt     Blue     S <-- this is a duplicate
Tshirt     Red      S

This would be

var items = new List<List<string>>()
{
    new List<string>() { "Tshirt", "Tshirt", "Tshirt", "Tshirt", "Tshirt", },
    new List<string>() { "Blue", "Blue", "Blue", "Blue", "Red", },
    new List<string>() { "S", "M", "L", "S", "S", },
};

And I would need to detect the fact that there are duplicates and print the duplicates as

Duplicate: Tshirt, Blue, S

Note: finding a duplicate in a single list as addressed in the referenced 'duplicate' is easy, and finding duplicates if the list is static is tackleable, but this is different in that the size is completely unknown. It could really be a List<List<string>> that has 0 elements, 1 or more.

16
  • 3
    Why do you have a List<List<string>>? That is just bizarre and makes what you want to do more difficult. Commented Feb 28, 2019 at 6:34
  • Please also confirm for me that I've validly transformed your data in C# code. Commented Feb 28, 2019 at 6:36
  • @IanKemp - It's not a duplicate. Commented Feb 28, 2019 at 6:37
  • Not a duplicate, the referenced question is on a single list, and the number of columns doesnt dynamically change. Commented Feb 28, 2019 at 6:40
  • 2
    Shorter path should be make it anything else than list of list. My weridest idea would be some string concatenation by "column" compare the result string. With a good enought separator it should be enought. Commented Feb 28, 2019 at 8:23

2 Answers 2

4

Give this a go:

var items = new List<List<string>>()
{
    new List<string>() { "Tshirt", "Tshirt", "Tshirt", "Tshirt", "Tshirt", },
    new List<string>() { "Blue", "Blue", "Blue", "Blue", "Red", },
    new List<string>() { "S", "M", "L", "S", "S", },
};

var duplicates =
    Enumerable
        .Range(0, items.First().Count)
        .Select(x => new { Product = items[0][x], Color = items[1][x], Size = items[2][x], })
        .GroupBy(x => x)
        .SelectMany(x => x.Skip(1).Take(1))
        .ToArray();

That gives:

duplicates


Given the need to handle a variable number of inner lists here's how to do it:

var duplicates =
    Enumerable
        .Range(0, items.First().Count)
        .Select(x => Enumerable.Range(0, items.Count).Select(y => items[y][x]).ToArray())
        .GroupBy(x => String.Join("|", x))
        .SelectMany(x => x.Skip(1).Take(1))
        .ToArray();

That gives:

duplicates2


Here's a lazy version that doesn't use Count:

var duplicates =
    items
        .Select(xs => xs.Select(y => Enumerable.Repeat(y, 1)))
        .Aggregate((z0s, z1s) => z0s.Zip(z1s, (z0, z1) => z0.Concat(z1)))
        .GroupBy(ws => String.Join("|", ws))
        .SelectMany(gws => gws.Skip(1).Take(1));
Sign up to request clarification or add additional context in comments.

10 Comments

I can try it, but I immediately notice that you have the length of the items array hardcoded in the Select statement. The List<List<string>> I am getting may be 1 element that I need to detect duplicates in or 2 or 3 or 5, and they arent always 'product/size/color', what they are is unknown to me, I just need to know if there are duplicates.
@esac - I haven't hard-coded the length - I've used items.First().Count. But you're saying that the number of List<string> is variable in the List<List<string>> so I can't use new { Product = items[0][x], Color = items[1][x], Size = items[2][x], }, right?
correct, the inner and outer lists both vary in the number of items. Just woke up and trying your suggestion now!
Your answer(s) work great, if you feel kind I also have a version I am working on where it is a Dictionary<string, List<string>> where the key is the column name, and the values is the List<string> - the end result needs to be a List of duplicates where each value is an anonymous class with the keys as the names and the values being the duplicate values.
@esac - I'm kind, but you'd need to ask a new question AND show that it is not a duplicate of this one. Put a link here when you post it.
|
1

You can use Zip() and Aggregate() LINQ methods to find duplicates in a List<List<string>> (or even List<List<object>>):

string separator = ";%;*;%;";   // Pick a string that's very unlikely to appear in results

var duplicates = items.Aggregate((currentList, nextList) =>
                            currentList.Zip(nextList, (currentListItem, nextListItem) =>
                                $"{currentListItem}{separator}{nextListItem}").ToList())
                      .GroupBy(item => item).Where(group => group.Count() > 1)
                      .Select(item => item.Key.Split(new[] { separator }, StringSplitOptions.None)
                            .ToList())
                      .ToList();

The Aggregate() method will effectively loop through the outer list, considering 2 inner lists at a time; these lists can then be Zipped together, item by item, to produce a new IEnumerable<string>; the ToList() call is necessary, as this new IEnumerable<string> becomes the next input into Aggregate() method, and must be in the same format as the next List<string>

Once all inner List<string> have been zipped into a new IEnumerable<string>, where the items are concatenated together with a separator (very important to have the separator, to avoid false positives on duplicate matching, as "aa" + "abb" == "aaa" + "bb"), you can simply GroupBy() the items, and find any group containing more than 1 item.

Finally, the last Select() converts the result back to List<List<string>> format, for easy comparison with original data.

This solution is fully LINQ (you can even hard-code the string separator directly into the query), and works for any number of inner lists (including only 1 list).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.