5

I have an interesting problem, and I can't seem to figure out the lambda expression to make this work.

I have the following code:

List<string[]> list = GetSomeData(); // Returns large number of string[]'s
List<string[]> list2 = GetSomeData2(); // similar data, but smaller subset
&nbsp;
List<string[]> newList = list.FindAll(predicate(string[] line){ 
    return (???);
});

I want to return only those records in list in which element 0 of each string[] is equal to one of the element 0's in list2.

list contains data like this:

"000", "Data", "more data", "etc..."

list2 contains data like this:

"000", "different data", "even more different data"

Fundamentally, i could write this code like this:

List<string[]> newList = new List<string[]>();
foreach(var e in list)
{
    foreach(var e2 in list2)
    {
        if (e[0] == e2[0])
            newList.Add(e);
    }
}
return newList;

But, i'm trying to use generics and lambda's more, so i'm looking for a nice clean solution. This one is frustrating me though.. maybe a Find inside of a Find?

EDIT: Marc's answer below lead me to experiment with a varation that looks like this:

var z = list.Where(x => list2.Select(y => y[0]).Contains(x[0])).ToList();

I'm not sure how efficent this is, but it works and is sufficiently succinct. Anyone else have any suggestions?

0

3 Answers 3

11

You could join? I'd use two steps myself, though:

var keys = new HashSet<string>(list2.Select(x => x[0]));
var data = list.Where(x => keys.Contains(x[0]));

If you only have .NET 2.0, then either install LINQBridge and use the above (or similar with a Dictionary<> if LINQBridge doesn't include HashSet<>), or perhaps use nested Find:

var data = list.FindAll(arr => list2.Find(arr2 => arr2[0] == arr[0]) != null);

note though that the Find approach is O(n*m), where-as the HashSet<> approach is O(n+m)...

Sign up to request clarification or add additional context in comments.

8 Comments

What is the reason for the HashSet? It seems to work well without the has set (see my edit above). Does the HashSet make it more efficent?
note; for very small lists, it can be more efficient to just scan the list [like your edit does]... but for small lists it is going to be very fast no matter what approach you use. As the list size increases, the scan approach can quickly become a bottleneck.
Ok. One piece of infromation I should mention is that the "keys" (or list2) will always be relatively small, probably less than 10. While the source (list) can be several hundred elements (up to 1000).
But in general: much more efficient, yes. Firstly, it only keeps the distinct keys; secondly, it uses a hash algorithm (similar to dictionary) so that Contains tends to O(1) rather than O(n) [essentially, think of it as an "index" in database terms].
If you have < 10 keys (list2), then either approach should be fine.
|
3

You could use the Intersect extension method in System.Linq, but you would need to provide an IEqualityComparer to do the work.

    static void Main(string[] args)
    {
        List<string[]> data1 = new List<string[]>();
        List<string[]> data2 = new List<string[]>();

        var result = data1.Intersect(data2, new Comparer());
    }

    class Comparer : IEqualityComparer<string[]>
    {
        #region IEqualityComparer<string[]> Members

        bool IEqualityComparer<string[]>.Equals(string[] x, string[] y)
        {
            return x[0] == y[0];
        }

        int IEqualityComparer<string[]>.GetHashCode(string[] obj)
        {
            return obj.GetHashCode();
        }

        #endregion
    }

2 Comments

Interesting solution, but it's larger than the original problem ;)
Depends on the context really. If this comes up repeatedly, put your equality comparer in a library assembly and you can use one simple call to Intersect to get the Intersection of your two lists. Later on if you need to, you can use the same comparer for Equals, or Except, or other uses
0

Intersect may work for you. Intersect finds all the items that are in both lists. Ok re-read the question. Intersect doesn't take the order into account. I have written a slightly more complex linq expression that will return a list of items that are in the same position (index) with the same value.

List<String> list1 = new List<String>() {"000","33", "22", "11", "111"};
List<String> list2 = new List<String>() {"000", "22", "33", "11"};

List<String> subList = list1.Select ((value, index) => new { Value = value, Index = index})
             .Where(w => list2.Skip(w.Index).FirstOrDefault() == w.Value )
             .Select (s => s.Value).ToList();


Result: {"000", "11"}

Explanation of the query:

Select a set of values and position of that value.

Filter that set where the item in the same position in the second list has the same value.

Select just the value (not the index as well).

Note I used: list2.Skip(w.Index).FirstOrDefault() //instead of list2[w.Index] So that it will handle lists of different lengths.

If you know the lists will be the same length or list1 will always be shorter then list2[w.Index] would probably a bit faster.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.