Using lambda expressions to get a subset where array elements are equal

Question

I have an interesting problem, and I can't seem to figure out the lambda expression to make this work.

I have the following code:

List<string[]> list = GetSomeData(); // Returns large number of string[]'s
List<string[]> list2 = GetSomeData2(); // similar data, but smaller subset
&nbsp;
List<string[]> newList = list.FindAll(predicate(string[] line){ 
    return (???);
});

I want to return only those records in list in which element 0 of each string[] is equal to one of the element 0's in list2.

list contains data like this:

"000", "Data", "more data", "etc..."

list2 contains data like this:

"000", "different data", "even more different data"

Fundamentally, i could write this code like this:

List<string[]> newList = new List<string[]>();
foreach(var e in list)
{
    foreach(var e2 in list2)
    {
        if (e[0] == e2[0])
            newList.Add(e);
    }
}
return newList;

But, i'm trying to use generics and lambda's more, so i'm looking for a nice clean solution. This one is frustrating me though.. maybe a Find inside of a Find?

EDIT: Marc's answer below lead me to experiment with a varation that looks like this:

var z = list.Where(x => list2.Select(y => y[0]).Contains(x[0])).ToList();

I'm not sure how efficent this is, but it works and is sufficiently succinct. Anyone else have any suggestions?

Marc Gravell · Accepted Answer · 2009-02-15 09:37:14Z

11

You could join? I'd use two steps myself, though:

var keys = new HashSet<string>(list2.Select(x => x[0]));
var data = list.Where(x => keys.Contains(x[0]));

If you only have .NET 2.0, then either install LINQBridge and use the above (or similar with a Dictionary<> if LINQBridge doesn't include HashSet<>), or perhaps use nested Find:

var data = list.FindAll(arr => list2.Find(arr2 => arr2[0] == arr[0]) != null);

note though that the Find approach is O(n*m), where-as the HashSet<> approach is O(n+m)...

edited Feb 15, 2009 at 9:37

answered Feb 15, 2009 at 9:08

Marc Gravell

1.1m273 gold badges2.6k silver badges3k bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Erik Funkenbusch Over a year ago

What is the reason for the HashSet? It seems to work well without the has set (see my edit above). Does the HashSet make it more efficent?

Marc Gravell Over a year ago

note; for very small lists, it can be more efficient to just scan the list [like your edit does]... but for small lists it is going to be very fast no matter what approach you use. As the list size increases, the scan approach can quickly become a bottleneck.

Erik Funkenbusch Over a year ago

Ok. One piece of infromation I should mention is that the "keys" (or list2) will always be relatively small, probably less than 10. While the source (list) can be several hundred elements (up to 1000).

Marc Gravell Over a year ago

But in general: much more efficient, yes. Firstly, it only keeps the distinct keys; secondly, it uses a hash algorithm (similar to dictionary) so that Contains tends to O(1) rather than O(n) [essentially, think of it as an "index" in database terms].

Marc Gravell Over a year ago

If you have < 10 keys (list2), then either approach should be fine.

|

David Wengier · Accepted Answer · 2009-02-15 09:12:48Z

3

You could use the Intersect extension method in System.Linq, but you would need to provide an IEqualityComparer to do the work.

    static void Main(string[] args)
    {
        List<string[]> data1 = new List<string[]>();
        List<string[]> data2 = new List<string[]>();

        var result = data1.Intersect(data2, new Comparer());
    }

    class Comparer : IEqualityComparer<string[]>
    {
        #region IEqualityComparer<string[]> Members

        bool IEqualityComparer<string[]>.Equals(string[] x, string[] y)
        {
            return x[0] == y[0];
        }

        int IEqualityComparer<string[]>.GetHashCode(string[] obj)
        {
            return obj.GetHashCode();
        }

        #endregion
    }

answered Feb 15, 2009 at 9:12

David Wengier

10.2k5 gold badges41 silver badges43 bronze badges

2 Comments

Erik Funkenbusch Over a year ago

Interesting solution, but it's larger than the original problem ;)

David Wengier Over a year ago

Depends on the context really. If this comes up repeatedly, put your equality comparer in a library assembly and you can use one simple call to Intersect to get the Intersection of your two lists. Later on if you need to, you can use the same comparer for Equals, or Except, or other uses

DarcyThomas · Accepted Answer · 2012-07-24 23:50:50Z

~~Intersect may work for you. Intersect finds all the items that are in both lists.~~ Ok re-read the question. Intersect doesn't take the order into account. I have written a slightly more complex linq expression that will return a list of items that are in the same position (index) with the same value.

List<String> list1 = new List<String>() {"000","33", "22", "11", "111"};
List<String> list2 = new List<String>() {"000", "22", "33", "11"};

List<String> subList = list1.Select ((value, index) => new { Value = value, Index = index})
             .Where(w => list2.Skip(w.Index).FirstOrDefault() == w.Value )
             .Select (s => s.Value).ToList();


Result: {"000", "11"}

Explanation of the query:

Select a set of values and position of that value.

Filter that set where the item in the same position in the second list has the same value.

Select just the value (not the index as well).

Note I used: list2.Skip(w.Index).FirstOrDefault() //instead of list2[w.Index] So that it will handle lists of different lengths.

If you know the lists will be the same length or list1 will always be shorter then list2[w.Index] would probably a bit faster.

Collectives™ on Stack Overflow

Using lambda expressions to get a subset where array elements are equal

3 Answers 3

8 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

8 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related