3

I'm new here and somewhat inexperienced with C#. I've been searching through the MSDN documentation and Google, but can't find the answer to this (I try to word it as generally as possible):

I want to store a fixed-length ordered sequence of integers in a list or array, and then make an array of these integer arrays. Does anyone know how I can count the number of distinct integer arrays, and what specific data type(s) I should use (List, plain array, etc.)? I don't have the exact code I've been working with, but here's something similar to what I have been trying:

int[] set1 = {2, 56, 8};
int[] set2 = {8, 25, 90};
int[] set3 = {2, 56, 8};

var superset = new List<int[]>;
superset.Add(set1);
superset.Add(set2);
superset.Add(set3);

Console.Out.WriteLine(superset.Distinct().Count());  //  would like this to output 2, but Distinct() doesn't seem to actually work and I would get 3
2
  • 1
    Do you absolutely need to use arrays of integers? If you make your fixed-length arrays into custom objects then just implement IEquatable(T) on that custom class and .Distinct() will compare it natively. Commented Jan 4, 2012 at 20:05
  • +1 for thinking along the same lines. This could be much easier if we knew what the precise problem was. Commented Jan 4, 2012 at 20:08

5 Answers 5

5

The Distinct method has an overload which takes an instance of IEqualityComparer. Create an implimentation of IEqualityComparer for an int array (ie public class IntArrayComparer : IEqualityComparer<int[]> ) and pass an instance into the call to Distinct.

The SequenceEqual method might be of some help for the implementation of IEqualityComparer<int[]>.Equals but that exercise is left to you.

Sign up to request clarification or add additional context in comments.

Comments

3

You just need to create a Comparer class for the integer array, and pass an instance of it to the Distinct method.

Console.Out.WriteLine(superset.Distinct(new ArrayComparer()).Count());

Here's an example:

class ArrayComparer : IEqualityComparer<int[]>
{
    public bool Equals(int[] x, int[] y)
    {
        //Check whether the compared objects reference the same data.
        if (Object.ReferenceEquals(x, y)) return true;

        //Check whether any of the compared objects is null.
        if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
            return false;

        if (x.Length != y.Length)
            return false;

        //Check whether the arrays' values are equal.
        for(int i = 0; i < x.Length; i++){
            if(x[i] != y[i])
                return false;
        }

        // If got this far, arrays are equal
        return true;
    }

    // If Equals() returns true for a pair of objects 
    // then GetHashCode() must return the same value for these objects.

    public int GetHashCode(int[] intArray)
    {
        //Check whether the object is null
        if (Object.ReferenceEquals(intArray, null)) return 0;

        //Calculate the hash code for the array
        int hashCode = 0;
        bool isFirst = true;
        foreach(int i in intArray){
            if(isFirst) {
                hashCode = i;
                isFirst = false;
            }
            else
            {
                hashCode = hashCode ^ i;
            }
        }
        return hashCode;
    }
}

That works for me. Gives the result you wanted.

Comments

0

None of the answers yet posted explains why Distinct().Count() returns 3: The reason is that Distinct() is using the default equality comparer for arrays, which compares for reference equality. This code would return 2:

int[] set1 = {2, 56, 8}; 
int[] set2 = {8, 25, 90}; 
int[] set3 = set1;

var superset = new List<int[]>(); 
superset.Add(set1); 
superset.Add(set2); 
superset.Add(set3); 

Console.WriteLine(superset.Distinct().Count()); 

As Bob and Richard suggest, you could overcome this by creating an implementation of IEqualityComparer<int[]> that would give you the desired behavior.

Comments

0
 private int CountDistinct2DPoints(double[][] data)
    {
        Dictionary<Tuple<double, double>, int> pointsMap = new Dictionary<Tuple<double, double>, int>();
        for(int i = 0; i < data.Length; i++)
        {
            if (!pointsMap.ContainsKey(Tuple.Create(data[i][0], data[i][1])))
            {
                pointsMap.Add(Tuple.Create(data[i][0], data[i][1]), 1);
            }
            else
            {
                pointsMap[Tuple.Create(data[i][0], data[i][1])]++;
            }
        }
        return pointsMap.Keys.Count;
    }

1 Comment

This worked for me :).........can extend from 2D to the more dimensions....don't know about the efficiency though!
-1

Keep in mind a famous quote:

“Smart data structures and dumb code works a lot better than the other way around.”
—Eric Raymond, The Cathedral and the Bazaar

It sounds like the goal here is to be able to use simple and expressive code (.Distinct()) to compare your data. In that case, I'd recommend upgrading from simple arrays to richer objects. Something like this:

class Numbers
{
  public int FirstNumber { get; set; }
  public int SecondNumber { get; set; }
  public int ThirdNumber { get; set; }
}

Then you can have an array of these objects, instead of an array of arrays. The benefit here is that you can endow this object with richer functionality. Such as:

class Numbers : IEquatable<Numbers>
{
  public int FirstNumber { get; set; }
  public int SecondNumber { get; set; }
  public int ThirdNumber { get; set; }

  public bool Equals(Numbers other)
  {
    if (other == null)
      return false;
    return (
      this.FirstNumber == other.FirstNumber &&
      this.SecondNumber == other.SecondNumber &&
      this.ThirdNumber == other.ThirdNumber
    );
  }
}

Now your smarter data type can be used more effectively by dumber code. (Just referencing the quote, don't think of it as saying that you're writing dumb code or anything unhelpful like that.) This is generally preferred because it means that you don't have to re-write the comparison logic in multiple places if it needs to be used in multiple places. The comparison logic happens internal to the data type, not in procedural code.


Note that this is untested and freehand code. If I missed something in the implementation, please correct me :)

3 Comments

Thank you, sir! This worked after I added a GetHashCode() method. I agree with that quote -- I admit I have a mostly functional programming background (C, assembly) and didn't study algorithms and data structures beyond an introductory course. Much appreciated!
Happened to answer me first with a solution that took minimal effort to implement. Other answers also seem to be correct, but David's suggestion to use richer data types instead of relying on procedural code was half of my question (what data type to use) and was enlightening.
Yes, I understand it might have solved your problem, but anyone coming back to this question won't find the answer where you've flagged it. All the other answers here are correct, some even with votes. They are much more useful to anyone coming later.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.