7

I have an array of objects similar to the following:

var routeArr = [
    {start: 1, end: 2},
    {start: 1, end: 3},
    {start: 1, end: 4},
    {start: 2, end: 1},
    {start: 3, end: 1},
    {start: 4, end: 1}
];

These objects represent the start and end point of lines and as such, {start: 1, end: 2} and {start: 2, end: 1} represent the same line.

I am trying to remove all duplicate lines from the array and cannot find an efficient or elegant way to do it. I have tried a nested loop but, I've been told that is bad practice (and I'm getting errors with my implementation, and it's just ugly).

for(var i = 0, numRoutes = routeArr.length; i < numRoutes; i++) {
    var primaryRoute = routeArr[i];

    for(var j = 0; j < numRoutes; j++) {
        var secondRoute = routeArr[j];

        if(primaryRoute.start === secondRoute.end && primaryRoute.end === secondRoute.start) {
            routeArr.splice(j, 1);
            continue;
        }
    }
}

Can anyone offer suggestions?

2
  • The normal way for doing this is: sort it firstly (in your case, you should reverse the start and end if (end > start)). Then the duplicate lines will be exactly the same with each other. Then just loop to remove the duplicate one Commented May 23, 2016 at 22:43
  • 1
    When you are removing an element of array never run loop from 0 to length. it is not safe because after removal you have to adjust your indices. Better run loop in descending order i.e. from length-1 to 0, this will work in case you will remove an element of array and never will come back elements with bigger indices. Also your if statement checks only one condition of being an identical lines, you have to add other check with or statement as well. Commented May 23, 2016 at 23:03

5 Answers 5

3

Create an object/map in javascript and keep the indexes of the unique objects, store "min(start,end):max(start,end)" as a key and index as a value. Here is an implementation of your question in javascript:

// your initial array
var routeArr = [
    {start: 1, end: 2},
    {start: 1, end: 3},
    {start: 1, end: 4},
    {start: 2, end: 1},
    {start: 3, end: 1},
    {start: 4, end: 1}
];

// map where we will store key => value where key is a joined start,end of your array's item and value is an item index 
var keyToRouteIndexMap = {};

for (var i in routeArr){
    // calculating min and max from start and end to understand {start:1, end:2} and {start:2, end:1} object as duplicates
    var min = Math.min(routeArr[i].start,routeArr[i].end);
    var max = Math.max(routeArr[i].start,routeArr[i].end);
    // unique key 
    var key = min+':'+max;
    if (!keyToRouteIndexMap.hasOwnProperty(key)){
        keyToRouteIndexMap[key] = i;
    }
}

for(var key in keyToRouteIndexMap){
    if(keyToRouteIndexMap.hasOwnProperty(key)){
        console.log(routeArr[keyToRouteIndexMap[key]]);
    }
}
Sign up to request clarification or add additional context in comments.

6 Comments

Note that if you are using ES6, you can instead use the Set object
There's also a few javascript implementations built to imitate hashsets: github.com/timdown/jshashtable
@Hamms: How would the Set object help? It only compares object references.
@Vahan Simonyan: You should probably check for hasOwnProperty() when iterating over object keys.
@le_m you obviously couldn't use the objects themselves as the elements of the Set, but you could construct a key the same way this example is and simply put it in a set rather than re-purposing an object to act as a set.
|
2

You can do like this. I guess this is very fast since there are no searches at all. One Array.prototype.reduce() operation to construct both the hash table (lookup table) and the reduced object at the same time. Then mapping the object keys to get the result. Here it is;

var routeArr = [
    {start: 1, end: 2},
    {start: 1, end: 3},
    {start: 1, end: 4},
    {start: 2, end: 1},
    {start: 3, end: 1},
    {start: 4, end: 1}
],

reduced = routeArr.reduce((p,c) => {!(p[c.start+"-"+c.end] || p[c.end+"-"+c.start]) && (p[c.start+"-"+c.end] = c);
                                     return p;},{}),
 result = Object.keys(reduced).map(e => reduced[e]);
console.log(result);

Well giving it a second thought i eliminated the redundant Object.keys() portion. Now this is nothing more than a single Array.prototype.reduce() pass all completed in just O(n). I suppose this might be as far as it gets concerning the performance. Check it out.

var routeArr = [
    {start: 1, end: 2},
    {start: 1, end: 3},
    {start: 1, end: 4},
    {start: 2, end: 1},
    {start: 3, end: 1},
    {start: 4, end: 1}
],

     reduced = routeArr.reduce((p,c) => {!(p[c.start+"-"+c.end]  ||
                                           p[c.end+"-"+c.start]) &&
                                          (p[c.start+"-"+c.end] = true,
                                           p.result.push(c));
                                           return p;
                                        },{"result":[]});
console.log(reduced.result);

Well ok yes i have to agree it looks a little cryptic but it is very simple.

  • We are using Array.prototype.reduce() method with an initial value here. This is our initial value {"result":[]}. When reducing our routeArr array our initial element to start with is now an object with a single property named result and value of an empty array.
  • reduce has been provided with an anonymous callback function which takes two arguments (p,c) p stands for previous and c stands for current. So in the first run p is our initializing object, i mean this {"result":[]} and c is the item at index 0 of the array (routeArr) that we have called reduce upon. So in the first round c is {start: 1, end: 2}.
  • In the beginning of every round we check if our p object contains a property which represent the current elements values in both orders. So the check comes like this !(p[c.start+"-"+c.end] || p[c.end+"-"+c.start]) which in human terms means "is it true that you don't have a string property like c.start-c.end or c.end-c.start".. So for example in the first round the check is like "is it true that you don't have a string property like "1-2" or "2-1". If it has (false) we do nothing but if it hasn't we perform the following actions;
  • && (p[c.start+"-"+c.end] = true, p.result.push(c)); return p;. OK the first && ties the two instructions in the parens to the condition of the previous instruction to evaluate to true. In a && b instruction JS engine will only evaluate b if a evaluates to true. So you got it. Again in human terms this is what happens. "is it true that you don't have a string property like "1-2" or "2-1" turns true and we create a property "1-2" with a value true. So in next rounds if we meet a 1-2 or 2-1 we will do nothing at all. Then we push this current object to the result property of the same object (p.result) to become a unique representative of all of it's duplicates or twins. Then we return p for a healthy continuation of the reduce cycles.

I hope it is clear.

3 Comments

Beautiful functional solution. Now I would really like to see a performance comparison with the non-functional approaches.
I think now you overdid it a bit with the code-minification. Chosing self-documenting variable names and not putting assignments into comparisons might help OP to better understand your nice solution :)
@le_m yes i guess you might be right. I will put some explanation below. To me this looks like a poem though. :)
2

Here is a general solution to the problem of removing duplicate values from javascript arrays:

/**
 * Takes an input array and returns a new array without identical elements.
 *
 * @param {array} input
 * @callback id   identity function returning identical values for identical elements
 */
function uniquify(input, id) {
    result = [];
    map = {};
    for (var i = 0, length = input.length; i < length; ++i) {
        var element = input[i], identity = id(element);
        if (!map.hasOwnProperty(identity)) {
            result.push(element);
            map[identity] = true;
        }
    }
    return result;
}

Applied to your given routeArr:

var routeArr = [
    {start: 1, end: 2},
    {start: 1, end: 3},
    {start: 1, end: 4},
    {start: 2, end: 1},
    {start: 3, end: 1},
    {start: 4, end: 1}
];

routeArr = uniquify(routeArr, function(route) {
    return route.start < route.end ? '' + route.start + ':' + route.end : '' + route.end + ':' + route.start;
});

1 Comment

I guess the callback function maps {start: 2, end: 3} and {start: 6, end: 1} to the same location.
2

Your nested loop methodology is 'ugly'- but that isn't your issue.

Your implementation errors are resulting from the fact that both of your for loops assume the array structure won't change as you're mutating it, which is causing you to skip over some items in the array.

'i' and 'j' are 'stupid' incrementers - That for loop isn't telling the code to go to the next item in the array with each iteration, it's telling it to go to (array[last_index_i_used+1] - So when you splice something the array you're looking at changes, and the next item in line gets passed over.

I see a lot of fancy array methods and ES6 suggestions, but I assume from your question that you are still a bit new to JS, and could use some time building fundamentals (no offense intended).

Try a recursive decrementing function:

function uniquify(inputArray, ind){
    var checkStart = inputArray[ind].start, checkEnd =inputArray[ind].end
    for (var i=(ind-1);i > -1; --i){
        var thisStart = inputArray[i].start, thisEnd = inputArray[i].end
        if ((thisStart == checkStart || thisStart == checkEnd) && (thisEnd == checkStart || thisEnd == checkEnd)){

            inputArray.splice(i,1)
        }
    }

    --ind
    if (ind > -1){
        uniquify(inputArray,ind)
    }
}
uniquify(routeArr,routeArr.length -1);

I like that better than a nested for loop as you are never hitting the same value more often than you need to, which keeps performance consistent regardless of the size of your array.

But you might want to ask yourself if whatever is defining 'routeArr' is doing whatever it's doing in a way that is intelligent - At best, it seems like it's wasting memory and CPU storing data in an inefficient way.

3 Comments

I recommend to use semicolons everywhere even though they are not always needed in JS. Also, your loop index i is a global variable, better make it local with var.
You are correct in assuming that I am new to JS, at least in a project of this size. Long story short, we're rewriting an old C++ app to run on iPad and Android. The decision was made to use HTML5 and Javascript with Cordova (which I supported at first but, now have my misgivings about). Unfortunately, some design decisions were made in the previous C++ version that forces us to work with the application data in unconventional ways. I'm trying to find an elegant and efficient solution. Thank you for your response, I'm trying it as we speak.
@le_m - Thanks for catching the lack of var in my loop. Edited to fix.
1

I've wrote the function below to do it neatly

var routeArr = [{
  start: 1,
  end: 2
}, {
  start: 1,
  end: 3
}, {
  start: 1,
  end: 5
}, {
  start: 2,
  end: 1
}, {
  start: 3,
  end: 1
}, {
  start: 4,
  end: 1
}];

routeArr.IsDuplicate = function(obj) {
    var i = this.length;
    var count = 0 
    while (i--) {
        if ((this[i].start === obj.start && this[i].end === obj.end ) || (this[i].start === obj.end && this[i].end === obj.start) ) {
            count++;
        }
    }
    return count>1;
}

for(var i = routeArr.length-1; i--;){
    if (routeArr.IsDuplicate(routeArr[i])) routeArr.splice(i, 1);
}

2 Comments

This is operationally inefficient. It requires you to evaluate each pair multiple times. If you increase the routeApp length to 19 for example, 189 evaluations are made. Map seems to be a cleaner approach. In Java, a Hashmap would be a great data structure for this type of implementation. github.com/timdown/jshashtable is a hashset implementation in javascript.
javascript objects already offer a 'hashmap' implementation, the only issue is that keys cannot be objects - so you need a 'hash' function either by a general mapping of objects to hashes (your link) which is inefficient as you would need to iterate through the prototype chain or by a more performant user-supplied function.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.