0

Edit:

Context: I inherited a process (from a former co-worker) the generates a generic file that, among other things, creates the following list of items. The list will later need to be turned into a series of unordered links with nesting levels preserved.

From the following array, I need to remove duplicates regardless of how many times it shows up based on the href attribute's value.

var array = [
 '<tag href="cheese.html">',
 '<tag href="cheddar.html"></tag>',
 '  <tag href="cheese.html"></tag>',
 '</tag>',
 '<tag href="burger.html">',
 ' <tag href="burger.html">',
 '   <tag href="burger.html"></tag>'
 ' </tag>'
 '</tag>'
 '<tag href="lettuce.html">',
 '  <tag href="lettuce.html">',
 '    <tag href="lettuce.html"></tag>',
 '  </tag>',
 '</tag>',
 '<tag href="tomato.html">',
 '  <tag href="tomato.html"></tag>',
 '  <tag href="tomato.html">',
 '    <tag href="tomato.html"></tag>',
 '    <tag href="tomato.html">',
 '      <tag href="tomato.html"></tag>',
 '      <tag href="tomato.html">',
 '        <tag href="tomato.html"></tag>',
 '      </tag>',
 '    </tag>',
 '  </tag>',
 '</tag>',
];

After the array has all duplicates removed, it should look like this:

'<tag href="cheese.html">',
'<tag href="cheddar.html"></tag>',
'</tag>',
'<tag href="burger.html">',
'</tag>',
'<tag href="lettuce.html">',
'</tag>',

From here, I have no problems extracting the info I need to generate my unordered list of links. I just need help figuring out how to remove the duplicates.

2
  • Why do you end up with two </tag> values? Commented Mar 9, 2017 at 0:54
  • One tag element is nested within another. Commented Mar 9, 2017 at 15:26

2 Answers 2

2

It would be helpful to know the context of your problem.

This function returns all strings with unique href value, but does nothing about managing closing tags. Removing closing tags would be a complex task. Plus I'm pretty sure parsing HTML with regex is not a good idea.

function sortByHref (array) {
  var hrefReg = new RegExp('href="(.*)"');
  var seen = {};
  var match, href;
  return array.filter(function (x) {
    match = hrefReg.exec(x);
    if (match) {
      href = match[1];
      if (seen.hasOwnProperty(href) && seen[href]) return false;
      seen[href] = true;
    }
    return true;
  });
}

There has to be another way to solve your problem, if you have described what exactly are you trying to accomplish.

Sign up to request clarification or add additional context in comments.

3 Comments

Very nice and elegant solution.
Works well but like you said, it doesn't do anything with the closing tags.
I think I found a solution the extends what you did: created a second array, loop through the cleaned array, push any element that doesn't match the output array of your function that matches this: cleanedArray[i].indexOf(' </tag>') > -1. In my tests, this removes any closing tag element that has a space in front of it. I'll run deeper tests and confirm that this works or not. Cheers!
1

Here is a purposely verbose solution for an easier understanding. I am assuming that tags without a href value will simply remove duplicates based on whole string.

var arr = [
    '<tag href="cheese.html">',
    '<tag href="cheddar.html"></tag>',
    '  <tag href="cheese.html"></tag>',
    '</tag>',
    '<tag href="burger.html">',
    ' <tag href="burger.html">',
    '   <tag href="burger.html"></tag>',
    ' </tag>',
    '</tag>'
];

// Remove whitespaces on both ends from each string in array
// Not a necessary step, but will just handle leading and trailing whitespaces this way for convenience
arr = arr.map(function(tagString) {
    return tagString.trim(); 
}); 

// Regex to retrieve href value from tags
var hrefRegexp = /(\s+href=\")([^\"]+)(\")/g;

// Create an array with just the href values for easier lookup
hrefArr = arr.map(function(tagString) {
    // Run regex against the tag string
    var href = hrefRegexp.exec(tagString); 

    // Reset `RegExp`'s index
    hrefRegexp.lastIndex = 0; 

    // If no href match is found, return null, 
    if (href === null) return null; 

    // Otherwise, return the href value
    else return href[2]; 
});

// Store array length (this value will be used in the for loop below)
var arrLength = arr.length; 

// Begin from the left and compare values on the right
for (var leftCompareIndex = 0; leftCompareIndex < arrLength; leftCompareIndex++) {
    for (var rightCompareIndex = leftCompareIndex + 1; rightCompareIndex < arrLength; rightCompareIndex++) {

        // A flag variable to indicate whether the value on the right is a duplicate
        var isRightValueDuplicate = false; 

        // If href value doesn't exist, simply compare whole string
        if (hrefArr[leftCompareIndex] === null) {
            if (arr[leftCompareIndex] === arr[rightCompareIndex]) {
                isRightValueDuplicate = true; 
            }
        }

        // If href value does exist, compare the href values
        else {
            if (hrefArr[leftCompareIndex] === hrefArr[rightCompareIndex]) {
                isRightValueDuplicate = true; 
            }
        }

        // Check flag and remove duplicate element from both original array and href values array
        if (isRightValueDuplicate === true) {
            arr.splice(rightCompareIndex, 1); 
            hrefArr.splice(rightCompareIndex, 1); 
            arrLength--; 
            rightCompareIndex--; 
        }
    }
}

console.log(arr); 

/* Should output
[ '<tag href="cheese.html">',
  '<tag href="cheddar.html"></tag>',
  '</tag>',
  '<tag href="burger.html">' ]
  */

1 Comment

I like the solution but it doesn't add in the last closing tag for <tag href="burger.html">.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.