8

Currently using JavaScript and I need to go through an array of arrays to determine if there are any duplicate arrays, and then deleting those duplicated arrays. Runtime is of the essence in this case, so I was wondering what the most EFFICIENT way of doing this is.

Is using a hash table desirable in this case? The scope of this would be to hash each sequence and then use the hash to determine whether that sequence occurs again. Hence, each sequence is an array within the master array, and any duplicates would be other arrays within the same array. Furthermore, it is extremely important that all individual arrays remain ordered themselves (i.e. the elements in the individual arrays must always keep their position). Also, all elements in the individual array are string values.

Example: Assume that there is an array A whose elements are in turn the following arrays:

A[0] = ["one", "two", "three", "four"]
A[1] = ["two", "one", "three", "four"]
A[2] = ["one", "two", "three", "four"]

In the above example, A[0] and A[2] are duplicates and so the function should return A[0] and A[1], such that there is only one instance of the same array.

6
  • 4
    good question, but where is your try? Commented Oct 8, 2014 at 15:16
  • Wondering what the ideal solution would be before implementing since time complexity is of the essence. Commented Oct 8, 2014 at 15:19
  • Efficiency here has two meanings: if coding the fastest algorithm will take one day but you are just checking 100 arrays, I'm not sure if that's efficient. Maybe a double for loop is enough. What are the sizes of A and A[n]? Commented Oct 8, 2014 at 15:23
  • Efficiency meaning the time taken by the algorithm to run, rather than the time taken to code the algorithm Commented Oct 8, 2014 at 15:24
  • so first of all - they are not duplicate. they are individual array literals that happen to serialise to the same but they are not the same array, as in ['one'] == ['one']; // false. you could write a function to compare them via lodash's _.isEqual - or you can use plain javascript to do it but it's not a 2-liner. especially if the real array is anything more complicated than a list of strings. if it's just strings/primitives, you could compare against item.join(','); Commented Oct 8, 2014 at 15:25

3 Answers 3

8

Keep an object where the keys are the joined elements of each array. If the key is not found add the array to the output array and add the key to the object.

var hash = {};
var out = [];
for (var i = 0, l = A.length; i < l; i++) {
  var key = A[i].join('|');
  if (!hash[key]) {
    out.push(A[i]);
    hash[key] = 'found';
  }
}

DEMO

Sign up to request clarification or add additional context in comments.

3 Comments

I will suggest using a different "join" character. Ideally it should be one not found in the input array's strings. If it should work for any input using JSON.stringify is an option.
['foo', 'bar'] is not the same as ['foobar'] but in your solution they would be considered equal.
Well spotted. I've chosen |.
2

Ok let us first have a look at the complexity of the naive solution: If there are n arrays, each with at most k entries, you need O(n^2 * k) comparisons, because for each of these n arrays, you have to compare it to n-1 others with k comparisons each. The space complexity is O(n*k)

So if you are willing to trade space for better performance, you can do the following: (Short disclaimer: I assume all your arrays have an equal number of k elements which is indicated but not approved by your question.)

Going one by one through the arrays, you pick the first element which we assume is a. Use a hash map to verify whether you saw this element as a first element before. If not, create a tree structure with a as its root, store it under a in your hash map and make it your current node. Now, for each subsequent entry in the current array, you check whether your current node has a child of that kind. So if the second entry is b, you add b to be a child of a.

Your tree now looks like that: (left to right: root to children)

a - b

Having c as the third entry works exactly the same:

a - b - c

Now we skip forward to have a look on an array [a, c, d]. You first encounter the tree for element a. For the second element, you check whether c is already a child of a. If not, add it:

  - b - c
a
  - c

same goes for the next entry:

  - b - c
a
  - c - d

Let us now see what happens when we check an array that we saw before: [a, b, c]

First we check a, see that there is already a tree and get it from the hash map. Next, we notice that ahas a child named b, so we descend to b. Now, for the last entry, we see that it is already there too, telling us that we encountered a duplicate which we can drop.

Sorry for the improvised drawing, I hope I can get the idea across. It is just about going through each array only once, storing it in a non-redundant way. So the time complexity would be O(n*k). The used space increases but is bounded by O(n*k) since the worst case is no array shared any prefix, which results in the same space complexity.

Hope I didn't overlook something.

2 Comments

Thanks for your input. Going about it using a tree structure hadn't actually crossed my mind. That being said, not all arrays have an equal number of elements.
You're welcome. Well, then my approach does not work anymore since you cannot distinguish whether a prefix of a path was encountered before...
1

ONELINER

A.filter((r={},a=>!(r[a]=++r[a]|0)))

I assume that your strings not contains , character. If contains then change twice r[a] to r[a.join('|')] (where | is arbitrary separator) or use r[a.map(x=>x.length+','+x)] to allow all characters in your strings. Here is working example.

Explanation

In r={} we set once temporary object. In filter function a=>... and is only for declare once empty temporary object in argument r={} . In function a=>... in a we have current A element . The JS make implicit cast a to string in r[a]. Then in !(r[a]=++r[a]|0) we increase counter of occurrence element a and return true (as filter function value) if element a appear first time.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.