-1

This query has been bothering me for the past 10 hours. Here we go:

I want to do a comparison to some data I am pulling. I am pulling names and I want to remove names that are similar and have them not return in the query.

Example:

I have the following names:

  • Seaside Heights
  • Seaside HGTS
  • Talladega
  • Tornkal Center
  • Tornkal CTR
  • Yonkers
  • Zebraville

I want it to return like this:

  • Seaside Heights
  • Talladega
  • Tornkal Center
  • Yonkers
  • Zebraville

Basically I think it should be substring(name, 0, 8) to get the first 8 characters then run that 8 characters against the next entry and if they match to ignore it.

Maybe I am thinking way to deep into this. Any insight or concepts that might work will be appreciated.

5
  • Does it matter if you have PHP or MySQL do this for you? Commented Mar 3, 2012 at 4:58
  • what is the relation of output to input?? what you will have for substring(name, 0, 8) ?? Commented Mar 3, 2012 at 4:59
  • need to compair only before space? Commented Mar 3, 2012 at 5:01
  • You only want to ignore an entry if its first 8 characters match the first 8 characters of another entry? What if the first 7 characters match? Commented Mar 3, 2012 at 5:04
  • I don't think this question deserves any upvotes as it's way too ambiguous and obscure Commented Mar 3, 2012 at 5:19

4 Answers 4

1

First, you would query all the data.

Then for every record returned you want to run the LCS algorithm (Longest Common Subsequence).

If the longest common Subsequence between two different records is of a number of your choosing then you can class them as similar.

http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

edit: It just so happens there's a nice PHP function for this: http://php.net/manual/en/function.similar-text.php

Sign up to request clarification or add additional context in comments.

Comments

1

Try below :

If the difference between strings is similar as you explained in example.

 select names from tablename group by substring_index(names," ",1)

2 Comments

Group by with no aggregate functions?
MySQL, however, allows this when simply chooses one value.
0

You might want to take a look at soundex. It won't be perfect, but it could get you in the ball park.

Comments

0

If the differences between strings are limited to a small set of abbreviations (HGTS <-> Heights, CTR <-> Center, etc), you might just want to keep a table of those and replace the abbreviations with the full versions, then check for uniqueness.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.