0

I have an array of similar kind of strings like the list of landmarks :

["AB Street", "A B Street", "AB Street XE", "AB Street X", "AB Street(XE)"]

Each of these represent a single landmark "AB Street".

I have tried different approaches, found a way for removing extra spaces and special characters but not able to figure out how to cut short the extraneous entries with extended names which anyways lead to same string.

Code snippet for removing spaces and special characters :

var landmarks = ["AB Street", "A B Street", "AB Street XE", "AB Street X", "AB Street(XE)"];
var formattedLandmarks = [];

landmarks.sort();
landmarks.forEach(function(location) {
  var key = location && location.toLowerCase();
  key = key.replace(/[.\/-]*/g, "");
  key = key.replace(/\(.*\)/i, "");
  key = key.replace(/[0-9, _-]*$/, "");
  key = key.replace(/[ \t]+/g, " ");
  key = key.toString().trim();
  key = key.charAt(0).toUpperCase() + key.slice(1);

  formattedLandmarks.push(key);
});

console.log(formattedLandmarks);

I expect the algorithm to return output as array with only one entry :

["AB Street"]

It will be really great if someone can help out with the best possible approach and algorithm to achieve the expected output, be it through RegExp or some other way.

Any help is appreciable.

12
  • so, the crucial pattern should consist of two words? what if the initial array looks as [ "ABStreet", "AB Street XQ", "AB Street XEA", "AB Street(XE)"] - how should look the expected result for it? Commented Jul 12, 2016 at 10:52
  • I think you need to be a bit more specify about 'similar kind of string'. To a human, yes we can easily derive that 'AB Street' is probably a valid 'similar kind of word'. Unfortunately machine isn't as smart as we are when it comes to pattern recognition, it requires us to tell them what sort of pattern to look for. In your example 'AB' can potentially be a similar kind too, as it do appear in other string as well. Commented Jul 12, 2016 at 10:56
  • Not necessarily two words. And for the one you have asked for, I expect it to be "ABStreet", in that case. @RomanPerekhrest Commented Jul 12, 2016 at 10:59
  • @SamuelToh I agree to what you are saying and that's exactly where my confusion is lying that should it be a result of different permutations and combinations or shall I take the shortest one or the longest one, which will be the best practice. Commented Jul 12, 2016 at 11:01
  • 2
    You're likely to end up with a function that will take 2 strings as input and return a similarity score in [0...1], but you'll still have to decide on the 'decent enough score' threshold. Maybe something close to the PHP function similar_text() which has apparently been ported to JS here. Not sure if this is the right algorithm for your needs, though. Commented Jul 12, 2016 at 11:10

1 Answer 1

1

You can try something like this:

Logic

  • Sort the array in ascending order
  • Set initial value to be blank
  • Loop over and check if current value has previous. If not, then push it in array.

Note: You are comparing parsed values, so you should sort based on these parsed values only.

var array = ["AB Street", "A B Street", "AB Street XE", "AB Street X", "AB Street(XE)"];
var regex = /[^a-z]/gi;

var final = [];
array.sort(function(item1, item2){
  var _a = item1.replace(regex,"");
  var _b = item2.replace(regex,"");
  return _a > _b? 1: _a < _b ? -1: 0;
}).reduce(function(currentItem, nextItem) {
  var _p = currentItem.replace(regex, "");
  var _c = nextItem.replace(regex, "");
  if (_c.indexOf(_p)<0 || !currentItem) {
    final.push(nextItem);
  }
  return nextItem;
}, "")

console.log(final)

Reference

Sign up to request clarification or add additional context in comments.

5 Comments

Can you please give appropriate names to the variables which you have given like a,b,c for now. Not able to understand properly.
Really great answer, impressive ! @PrernaJain To understand the regex, use : regexr.com. To understand sort and reduce, Google for Array.prototype.sort() and Array.prototype.reduce(). p means previous, c means current.
@PrernaJain I have updated my answer and have added reference links. @ The Ninja, thanks.
@Rajesh Thanks a lot man. Algo worked perfectly fine and as far as i can think, it is among best approaches.
@TheNinja Thanks for mentioning the references. I was aware about array prototypes and regEx, just asked for naming the variables to make it more understandable, even for a novice.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.