8

I have been playing around a bit with a fairly simple, home-made search engine, and I'm now twiddling with some relevancy sorting code.

It's not very pretty, but I'm not very good when it comes to clever algorithms, so I was hoping I could get some advice :)

Basically, I want each search result to get scoring based on how many words match the search criteria. 3 points per exact word and one point for partial matches

For example, if I search for "winter snow", these would be the results:

  • winter snow => 6 points
  • winter snowing => 4 points
  • winterland snow => 4 points
  • winter sun => 3 points
  • winterland snowing => 2 points

Here's the code:

String[] resultWords = result.split(" ");
String[] searchWords = searchStr.split(" ");
int score = 0;
for (String resultWord : resultWords) {
    for (String searchWord : searchWords) {
        if (resultWord.equalsIgnoreCase(searchWord))
            score += 3;
        else if (resultWord.toLowerCase().contains(searchWord.toLowerCase()))
            score++;
    }
}
2
  • 1
    What exactly is the problem you're looking to solve? is it too slow? uses large amounts of memory? what optimization did you have in mind? Commented May 14, 2009 at 10:14
  • Speed mostly. Turns out it may be the database that's bottlenecking me though. Commented May 14, 2009 at 14:22

5 Answers 5

3

Your code seems ok to me. I suggest little changes:

Since your are going through all possible combinations you might get the toLowerCase() of your back at the start.

Also, if an exact match already occurred, you don't need to perform another equals.

    result = result.toLowerCase();
    searchStr = searchStr.toLowerCase();

    String[] resultWords = result.split(" ");
    String[] searchWords = searchStr.split(" ");
    int score = 0;
    for (String resultWord : resultWords) {
        boolean exactMatch = false;
        for (String searchWord : searchWords) {
            if (!exactMatch && resultWord.equals(searchWord)) {
                exactMatch = true;
                score += 3;
            } else if (resultWord.contains(searchWord))
                score++;
        }
    }

Of course, this is a very basic level. If you are really interested in this area of computer science and want to learn more about implementing search engines start with these terms:

Sign up to request clarification or add additional context in comments.

1 Comment

I got late :(. The only difference was I named it exactMatchFound. +1
1
  • stemming
  • for acronyms case sensitivity is important, i.e. SUN; any word that matches both content and case must be weighted more than 3 points (5 or 7)?
  • use the strategy design pattern

For example, consider this naive score model:

interface ScoreModel {
     int startingScore();
     int partialMatch();
     int exactMatch();
}

...

int search(String result, String searchStr, ScoreModel model) {
    String[] resultWords = result.split(" ");
    String[] searchWords = searchStr.split(" ");
    int score = model.startingScore();

    for (String resultWord : resultWords) {
        for (String searchWord : searchWords) {
            if (resultWord.equalsIgnoreCase(searchWord)) {
                score += model.exactMatch();
            } else if (resultWord.toLowerCase().contains(searchWord.toLowerCase())) {
                score += model.partialMatch();
            }
        }
    }

    return score;
}

Comments

0

Basic optimization can be done by preprocessing your database: don't split entries into words every time.

Build words list (prefer hash or binary tree to speedup search in the list) for every entry during adding it into DB, remove all too short words, lower case and store this data for further usage.

Do the same actions with the search string on search start (split, lower case, cleanup) and use this words list for comparing with every entry words list.

1 Comment

More advanced optimization can be done though filtering words forms (remove common endings), building one global hash [word => db entry] and using it for early entries filtering. For example - compare only entries that has at least one match in global hash with one of the search string words.
0

1) You can sort searchWords first. You could break out of the loop once your result word was alphabetically after your current search word.

2) Even better, sort both, then walk along both lists simultaneously to find where any matches occur.

Comments

0

You can use regular expressions for finding patterns and lengths of matched patterns (for latter classification/scoring).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.