0

My PHP script needs to check for matches throughout an array of data. It's currently looking for exact string matches. I'd like it to be less strict.

For example, if the array holds the string "Tom and Jerry" I would like to return true for: "Tom Jerry", "Tom & Jerry" and maybe even "Tom and Jery". I found links to PHP search engines they are more complex and not really what I need. My data is fairly small and dynamic, so there's no indexing.

I know I could write a big hairy regular expression, but I'm pretty sure I would be reinventing the wheel, because I'm sure others have already done this. Any advice on where to look or how to approach this would be much appreciated.

EDIT: To clarify, I'm trying to avoid entering all the dynamically generated data into a DB.

2
  • How small or dynamic is your data? Is it feasible to develop a manual list of alternatives of spelling variations? Because a computer would also come up with Berry which may not be what you want. Commented Jun 7, 2012 at 18:41
  • The data is an array of Facebook profiles returned by the graph API. My script is searching the names of employers against a user-provided search query Commented Jun 7, 2012 at 18:54

3 Answers 3

1

If the data were in MySQL, you could use a full text search. This is quite easy to develop; the question is: would that be too heavy-weight of a solution?

Sign up to request clarification or add additional context in comments.

1 Comment

The data isn't in MySQL because it's returned by the Facebook graph API. I guess, in theory, I could dump it into the database, but it seems like over-kill, since that data is entirely relative to each user.
1

It may require some trial and error but you could do:

  • Make a manual list of words that may be absent, such 'and', 'in', 'of', et cetera (such as in your Tom Jerry example).
  • Compute the Hamming distance between the string and the search query. If it is low (perhaps at most one or two), return true.
  • Otherwise, return false.

1 Comment

This is along the lines of what I had in mind. I was hoping someone smarter than me had already done this better than I could. If not, can you point me in the right direction of calculating the hamming distance? Are there built-in PHP functions for this?
0

I just discovered two functions which appear to do what I want:

similar_text()

levenshtein()

Both seem to return an intiger representing the "closeness" of the match between two strings. The difference between the two is over my head.

My search was aided by this S.O. question.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.