70

Let's say I have these two arrays:

var array1 = new[] {"A", "B", "C"};
var array2 = new[] {"A", "C", "D"};

I would like to get the differences between the two. I know I could write this in just a few lines of code, but I want to make sure I'm not missing a built in language feature or a LINQ extension method.

Ideally, I would end up with the following three results:

  • Items not in array1, but are in array2 ("D")
  • Items not in array2, but are in array1 ("B")
  • Items that are in both

5 Answers 5

131

If you've got LINQ available to you, you can use Except and Distinct. The sets you asked for in the question are respectively:

- array2.Except(array1)
- array1.Except(array2)
- array1.Intersect(array2)
Sign up to request clarification or add additional context in comments.

5 Comments

Do you know what kind of performance guarantees are one this? Presumably Except would have to make a sorted copy of each array first. I can't find any of this on MSDN.
No, it doesn't make a sorted copy. It creates a set from the excluded sequence, and then iterates over the source sequence, yielding any element which isn't in the excluded sequence.
(When I say "set" I mean "hash set".)
Thanks, I was hoping this would be something I could use in my application, but since my data is presorted and very large, the speed loss is too high.
I wouldn't discount Linq before having tried it, it and the compiler have a range of optimizations that can make it surprisingly fast. In the latest versions of the framework, AVX extensions are used to produce truly mind-boggling performance in some scenarios. Far faster than even a for iteration could do... Try it first, see if it's really as slow as you think it is... I doubt it will be.
13

from the MSDN 101 LINQ samples....

public void Linq52() {
    int[] numbersA = { 0, 2, 4, 5, 6, 8, 9 };
    int[] numbersB = { 1, 3, 5, 7, 8 };

    IEnumerable<int> aOnlyNumbers = numbersA.Except(numbersB);

    Console.WriteLine("Numbers in first array but not second array:");
    foreach (var n in aOnlyNumbers) {
        Console.WriteLine(n);
    }
}

Comments

4

Here are the benchmarks of LINQ extension methods. The results were obtained during the development of a real program.

The tests: 2 lists (lst1 and lst2) each approximately 250000 objects. Each object (class Key) contains a string and an integer. The second list mostly contains the same entries as the first one, but some new entries are added and some are removed.

I tested the Except extension method.

var except = lst2.Except(lst1);

List lst = except.ToList();

These 2 lines produced 600 items list of “new additions”. I timed it using the StopWatch object. The speed is astonishing:220 ms. The computer I used is by no means a “speedy Gonzales”. Core 2 Duo T7700 – 2.4GHz.

Note:

Here is the class Key, which implements IEquatable i-face.

public class Key : IEquatable<Key>
{
    public int Index { get; private set; }
    public string Name { get; private set; }

    public Key(string keyName, int sdIndex)
    {
        this.Name = keyName;
        this.Index = sdIndex;
    }

 // IEquatable implementation
    public bool Equals(Key other)
    {
        //Check whether the compared object is null.
        if (Object.ReferenceEquals(other, null)) return false;
        //Check whether the compared object references the same data.
        if (Object.ReferenceEquals(this, other)) return true;
        //Check whether the products' properties are equal.
        return Index.Equals(other.Index) && Name.Equals(other.Name);
    }

    // If Equals() returns true for a pair of objects 
    // then GetHashCode() must return the same value for these objects.
    public override int GetHashCode()
    {
        //Get hash code for the name field if it is not null.
        int hashKeyName = Name == null ? 0 : Name.GetHashCode();
        //Get hash code for the index field.
        int hashKeyIndex = Index.GetHashCode();
        //Calculate the hash code for the Key.
        return hashKeyName ^ hashKeyIndex;
    }
}

Comments

4

I've had to do things similar to this with very large sets of data. If you're dealing with a few thousand or so, use the Linq stuff since it's much clearer. But if you know that your arrays are pre-sorted, running a merge like this can do it significantly faster, since it only makes one pass through the data and doesn't need to allocate as much memory as the Linq version.

int iA = 0;
int iB = 0;
List<int> inA = new List<int>();
List<int> inB = new List<int>();
List<int> inBoth = new List<int>();
while (iA < numbersA.Length && iB < numbersB.Length)
{
    if (numbersA[iA] < numbersB[iB])
    {
        inA.Add(numbersA[iA++]);
    }
    else if (numbersA[iA] == numbersB[iB])
    {
        inBoth.Add(numbersA[iA++]);
        ++iB;
    }
    else
    {
        inB.Add(numbersB[iB++]);
    }
}
while (iA < numbersA.Length)
{
    inA.Add(numbersA[iA++]);
}
while (iB < numbersB.Length)
{
    inB.Add(numbersB[iB++]);
}

Again, this is really only needed if you are dealing with hundreds of thousands of values.

Comments

1

Another solution would be like below as well

int[] arr1 = new int[] { 45, 26, 99, 55, 36 };
int[] arr2 = new int[] { 45, 26, 99, 20, 36 };

var res = arr1.Union(arr2).Except(arr1.Intersect(arr2));

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.