Finding unique rows of jagged array

Question

I have a jagged array of strings and I need to find all the rows which are unique. For e.g.,

[ 
 ["A","B"] , 
 ["C","D","E"], 
 ["B", "A"],
 ["E","A"] 
]

This should return row 1 and row 3 as row 0 and row 2 are duplicates. How can this be done? Can I make use of hashets ?

Taken as arrays, row 0 and row 2 are not duplicates. They just have the same set of elements. — SWeko
– SWeko, Commented Dec 26, 2012 at 22:32
Yes, you can use HashSet. Either create a wrapper type for each Row or use an IEqualityComparer with the HashSet constructor. (Make sure to use the desired business rules: e.g. sort first before computing hash value or checking sequence equality.) — user166390
– user166390, Commented Dec 26, 2012 at 22:32
(Even if not using a HashSet, creating an IEqualityComparer is likely advisable and can be used with other approaches that require testing for the "equality" per the business rules.) — user166390
– user166390, Commented Dec 26, 2012 at 22:39
I have edited your title. Please see, "Should questions include “tags” in their titles?", where the consensus is "no, they should not". — John Saunders
– John Saunders, Commented Dec 26, 2012 at 22:51

SWeko · Accepted Answer · 2012-12-26 23:36:09Z

2

First of all, taken as arrays, row 0 and row 2 are not duplicates. They just have the same set of elements. However, if you just want to remove those kind of rows, you could do something like:

string[][] GetNonDuplicates(string[][] jagged)
{
  //not a hashset, but a dictionary. A value of false means that the row 
  //is not duplicate, a value of true means that at least one dulicate was found
  Dictionary<string[], bool> dict = 
          new Dictionary<string[], bool>(new RowEqualityComparer());

  foreach(string[] row in jagged)
  {
    //if a duplicate is found - using the hash and the compare method
    if (dict.ContainsKey(row)) 
    {
       dict[row] = true;  //set value to true
    }
    else
    {
      dict.Add(row, false);  //first time we see this row, add it
    }
  }

  //just pop out all the keys which have a value of false
  string[][] result = dict.Where(item => !item.Value)
                          .Select(item => item.Key)
                          .ToArray();
  return result;
}

...
string[][] jagged = new []{new []{"A","B"} , 
                           new []{"C","D","E"}, 
                           new []{"B", "A"},
                           new []{"E","A"}};

string[][] nonDuplicates = GetNonDuplicates(jagged);

where RowEqualityComparer is:

class RowEqualityComparer : IEqualityComparer<string[]>
{
    public bool Equals(string[] first, string[] second)
    {
        // different legths - different rows
        if (first.Length != second.Length)
          return false;

        //we need to copy the arrays because Array.Sort 
        //will change the original rows
        var flist = first.ToList();
        flist.Sort();
        var slist = second.ToList();
        slist.Sort();

        //loop and compare one by one
        for (int i=0; i < flist.Count; i++)
        {
            if (flist[i]!=slist[i])
              return false;
        }
        return true;
    }

    public int GetHashCode(string[] row)
    {
       //I have no idea what I'm doing, just some generic hash code calculation
       if (row.Length == 0)
         return 0;
       int hash = row[0].GetHashCode();
       for (int i = 1; i < row.Length; i++)
         hash ^= row[i].GetHashCode();
       return hash;
    }

}

edited Dec 26, 2012 at 23:36

answered Dec 26, 2012 at 23:05

SWeko

30.9k11 gold badges74 silver badges107 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Tim Schmelter Over a year ago

I assume that not only the order is irrelevant but also the Length is not meaningful when duplicates in the arrays are not counted(a HashSet would eliminate them).

SWeko Over a year ago

I'm working on the assumption that [A,B,B] and [A,A,B] would be considered different. In that case, this comparison makes sense. Else, a HashSet would be a correct approach.

Tim Schmelter · Accepted Answer · 2012-12-26 23:53:59Z

Assuming that you want to ignore the order, duplicates(since you've already mentioned a HashSet) and the result should contain only arrays which don't have duplicates.

You could implement a custom IEqualityComparer<String[]> for Enumerable.GroupBy and select only arrays which are unique(group-count==1):

class IgnoreOrderComparer : IEqualityComparer<string[]>
{
    public bool Equals(string[] x, string[] y)
    {
        if (x == null || y == null) return false;
        return !x.Distinct().Except(y.Distinct()).Any();
    }

    public int GetHashCode(string[] arr)
    {
        if (arr == null) return int.MinValue;
        int hash = 19;
        foreach (string s in arr.Distinct())
        {
            hash = hash + s.GetHashCode();
        }
        return hash;
    }
}

The rest is straightforward:

String[][] uniques = arrays.GroupBy(arr => arr, new IgnoreOrderComparer())
                           .Where(g => g.Count() == 1)
                           .Select(g => g.First())
                           .ToArray();

Edit: Here's a possibly more efficient version using the same comparer:

IEqualityComparer<string[]> comparer = new IgnoreOrderComparer();
String[][] uniques = arrays.Where(a1 =>
    !arrays.Any(a2 => a1 != a2 && comparer.Equals(a1, a2)))
                           .ToArray();

Sam I am says Reinstate Monica · Accepted Answer · 2012-12-26 22:41:29Z

1

As far as the algorithmic solution goes, I'd

sort your rows (you can use any sorting metric you like so long as it distinguishes any 2 different rows.)
pick the rows that don't have an identical adjacent row.

If you do that, you should be able to complete your requirement in O(m*n*lg(n)) where m is the length of your rows, and n is the number of rows

Given that sets of values implies equality, you can sort the cells of each row to help you sort the list of rows. this would result in O(n*m*lg(m) + m*n*lg(n))

edited Dec 26, 2012 at 22:41

answered Dec 26, 2012 at 22:33

Sam I am says Reinstate Monica

31.3k12 gold badges74 silver badges101 bronze badges

1 Comment

annantDev Over a year ago

Can you post some example?

DarthVader · Accepted Answer · 2012-12-26 23:14:27Z

0

I would compute the hash of each row as follows:

[ 
 ["A","B"] , // hash of this row :10 as example 
 ["C","D","E"], // hash of this row  : 20
 ["B", "A"], // hash of this row would be 10 as well
 ["E","A"] 
]

Since they are all string, you can calculate the hash values and create a hash per row.

The way you can use HashSet can be as follow, create hashset per row and then find difference of a row with every other row, if difference is empty then they are same.

You can use intersection as well, if the intersection is not empty that row is not unique.

answered Dec 26, 2012 at 23:14

DarthVader

55.6k78 gold badges225 silver badges309 bronze badges

Collectives™ on Stack Overflow

Finding unique rows of jagged array

4 Answers 4

2 Comments

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related