4

I am creating a web application using php and mySql.

It is basically a simple search form with a single textbox.

The user input can be a combination of keywords, for which I am using php explode() function, after string_ireplace().

Now I want to search each keyword (say val 1, val 2, ..... val n) against each field (say filed 1, filed 2, ...... field n) in a single table.

I feel I will have to use multiple for loops- for each value search all fields.

But how can I sort the result according to relevance, ie. records that match all values will appear first and so on.

Since this table sorting is not at database level, I am not able to use ORDER BY clause.

EDIT: OK. I thought I must explain in detail what I am looking for and what I have achieved. The following code I have written, it almost serves my purpose, but looks quite time consuming(for execution).

<?php
//$str = mysql_real_escape_string($_GET['searchText']);
$str = "val1, val2, val3";
$str = trim($str);

// check for an empty string and display a message.
if ($str == "") {
    $resultmsg =  "<p>Search Error: Please enter a search keyword...</p>" ;
}
    $str = explode(",",$str);
    //Create array for all fields
    $fields = array("filed1","filed2","filed3","filed4");
    $condition = "";

    for ($j=0;$j<count($str);$j++){
        for ($i=0;$i<count($fields);$i++){
            $condition = $condition.$fields[$i]." = '".$str[$j]."' OR ";            
        }
        $condition = rtrim($condition, " OR");
        $condition = $condition.") AND (";      
    }
    $condition = rtrim($condition, " AND (");
    $sql = "SELECT * FROM TABLE WHERE (".$condition;
    echo $sql;
    echo "<br /><hr>";

    $condition = "";
    for ($j=0;$j<count($str);$j++){
        for ($i=0;$i<count($fields);$i++){
            $condition = $condition.$fields[$i]." = '%".$str[$j]."%' OR ";          
        }
        $condition = rtrim($condition, " OR");
        $condition = $condition.") AND (";      
    }
    $condition = rtrim($condition, " AND (");
    //$condition = str_replace("="," LIKE ",$condition);
    $sql= "SELECT * FROM TABLE WHERE (".$condition;
    echo $sql;
    echo "<br /><hr>";

    //testing
    if (count($str)==3){
    $condition = "";
    for ($j=0;$j<count($str)-1;$j++){
        for ($i=0;$i<count($fields);$i++){
            $condition = $condition.$fields[$i]." = '%".$str[$j]."%' OR ";          
        }
        $condition = rtrim($condition, " OR");
        $condition = $condition.") AND (";      
    }
    $condition = rtrim($condition, " AND (");
    //$condition = str_replace("="," LIKE ",$condition);
    $sql= "SELECT * FROM TABLE WHERE (".$condition;
    echo "<strong>Matching ".$str[0]." AND ".$str[1]."<br /></strong>";
    echo $sql;
    echo "<br /><hr>";
    $condition = "";
    for ($j=1;$j<count($str);$j++){
        for ($i=0;$i<count($fields);$i++){
            $condition = $condition.$fields[$i]." = '%".$str[$j]."%' OR ";          
        }
        $condition = rtrim($condition, " OR");
        $condition = $condition.") AND (";      
    }
    $condition = rtrim($condition, " AND (");
    //$condition = str_replace("="," LIKE ",$condition);
    $sql= "SELECT * FROM TABLE WHERE (".$condition;
    echo "<strong>Matching ".$str[1]." AND ".$str[2]."<br /></strong>";
    echo $sql;
    echo "<br /><hr>";
    $condition = "";
    for ($j=0;$j<count($str);$j=$j+2){
        for ($i=0;$i<count($fields);$i++){
            $condition = $condition.$fields[$i]." = '%".$str[$j]."%' OR ";          
        }
        $condition = rtrim($condition, " OR");
        $condition = $condition.") AND (";      
    }
    $condition = rtrim($condition, " AND (");
    //$condition = str_replace("="," LIKE ",$condition);
    $sql= "SELECT * FROM TABLE> WHERE (".$condition;
    echo "<strong>Matching ".$str[2]." AND ".$str[0]."<br /></strong>";
    echo $sql;
    echo "<br /><hr>";
    }

?>

The output I am getting is as follows:

Matching all values EXACTLY

SELECT * FROM TABLE WHERE (filed1 = 'val1' OR filed2 = 'val1' OR filed3 = 'val1' OR filed4 = 'val1') AND (filed1 = ' val2' OR filed2 = ' val2' OR filed3 = ' val2' OR filed4 = ' val2') AND (filed1 = ' val3' OR filed2 = ' val3' OR filed3 = ' val3' OR filed4 = ' val3')

Matching all values PARTIALLY

SELECT * FROM TABLE WHERE (filed1 = '%val1%' OR filed2 = '%val1%' OR filed3 = '%val1%' OR filed4 = '%val1%') AND (filed1 = '% val2%' OR filed2 = '% val2%' OR filed3 = '% val2%' OR filed4 = '% val2%') AND (filed1 = '% val3%' OR filed2 = '% val3%' OR filed3 = '% val3%' OR filed4 = '% val3%')

Matching val1 AND val2

SELECT * FROM TABLE WHERE (filed1 = '%val1%' OR filed2 = '%val1%' OR filed3 = '%val1%' OR filed4 = '%val1%') AND (filed1 = '% val2%' OR filed2 = '% val2%' OR filed3 = '% val2%' OR filed4 = '% val2%')

Matching val2 AND val3

SELECT * FROM TABLE WHERE (filed1 = '% val2%' OR filed2 = '% val2%' OR filed3 = '% val2%' OR filed4 = '% val2%') AND (filed1 = '% val3%' OR filed2 = '% val3%' OR filed3 = '% val3%' OR filed4 = '% val3%')

Matching val3 AND val1

SELECT * FROM TABLE WHERE (filed1 = '%val1%' OR filed2 = '%val1%' OR filed3 = '%val1%' OR filed4 = '%val1%') AND (filed1 = '% val3%' OR filed2 = '% val3%' OR filed3 = '% val3%' OR filed4 = '% val3%')

I can now keep on appending the fetched data into my result table. But somehow I dont feel this is a smart solution. Moreover I have a restriction on number of search values(eg. 3 here). I hope I able to explain what exactly I am looking for.

9
  • 2
    Hoo boy, that's a tough one. I'm posting this as a comment, cuz it's not directly an answer to your question, but you may want to create a 'search keywords' table. It's denormalised, so you are duplicating data for the sake of optimisation. Questions though: 1. are you doing a LIKE search or an exact match? 2. As a corollary of 1, will there be only one keyword in each field, or will it be a collection of words separated by spaces/punctuation? Commented Oct 18, 2011 at 5:25
  • Yes I know this is difficult. For your Q1, does it really matter whether it is exact or LIKE. I will use a wild character and consider both as 1 occurrence, and for Q2 I will instruct user to use + sign to separate two keywords, for the sake of simplicity. I was thinking what if I use number of separate queries (equals to number of keywords) and then merge the results into a single table. Commented Oct 18, 2011 at 6:17
  • You have something in your mind, and you ask how it can be done, no problem so far. However without providing satisfing information and detailed accomping rules, I believe any answear given will have as a base your own frame-concept of mind, will be limited on a specific path due to lack of information. Commented Oct 18, 2011 at 7:16
  • @ArunavaDey My reason for asking the questions was that, if each field only holds one keyword, then you won't have to use LIKE, but if it holds more than one, you will. It'll certainly be fastest (in development time) to use a number of querieas and then compile the results in PHP, but it won't be very optimised. The fastest (in server time) is a separate search table... I'll wait for more info before I explain how a search table works though. Commented Oct 18, 2011 at 20:35
  • @Paul d'Aoust In my database each field only holds one keyword. So I don't feel I need to use LIKE. Commented Oct 19, 2011 at 13:25

1 Answer 1

7

Here's an answer, which deals with the issue of scalability (restriction on number of keywords that a user can search) and sorting in order of relevance (number of matched keywords per map). I've removed things like checking for empty values, but I've added a few things; read the comments to see what. Haven't tested it out, so I don't know how well it performs...

<?php
$str = preg_split('/[\s,\+]+/', $str); // splits $str into individual words when
                                       // it finds spaces, commas, and/or plus
                                       // signs. This way, you won't have to
                                       // force users to use plus signs
$fields = array('field1', 'field2', 'field3', 'field4');
foreach ($keywords as $i => $keyword) {
    // escapes and quotes those keywords to prevent against injection attack
    $keywords[$i] = '"' . mysql_real_escape_string($keyword) . '"';
}
// concatenates the keywords into one string that we can use as a set in
// MySQL's IN() clause
$keywords = implode(',', $keywords);
$fieldSearchQueries = array ();
foreach ($fields as $thisField) {
    // here's the IN() clause, checking each field against the set of keywords
    $fieldSearchqueries[] = 'CASE WHEN ' . $thisField . ' IN (' . $keywords . ')'
        . ' THEN 1 ELSE 0 END';
}
$query = 'SELECT *, ' . implode(' + ' $fieldSearchQueries) . ' AS rank '
    . 'FROM TABLE WHERE rank > 0 ORDER BY rank';
?>

This should create a query something like this:

SELECT *,
    CASE WHEN field1 IN ('India', 'Rainfall', 'English') THEN 1 ELSE 0 END
    + CASE WHEN field2 IN ('India', 'Rainfall', 'English') THEN 1 ELSE 0 END
    + CASE WHEN field3 IN ('India', 'Rainfall', 'English') THEN 1 ELSE 0 END
    + CASE WHEN field4 IN ('India', 'Rainfall', 'English') THEN 1 ELSE 0 END
    AS rank
FROM TABLE
WHERE rank > 0
ORDER BY rank

I just learned about CASE WHEN today, from this answer which is about an almost identical problem. The way I've set it up, it should return 1 if field1 is in the keywords, then add 1 if field2 is in the keywords, and so on. This gives you rank, a new field that you can sort by. The original answerer says it's not very efficient, though, so you might want to check out the other solutions on that page as well.

Sign up to request clarification or add additional context in comments.

9 Comments

For CASE statements that simply choose between one of two values, IF() is even easier: IF(field1 in (...), 1, 0) + IF(field2 in (...), 1, 0) + ...
Oh nice! Thanks for the info. Do you know if it's more efficient at all? (To be honest, I don't know in what way CASE WHEN is inefficient; I just got that comment from the answer I linked to :-)
this is working superb. but it has a drawback for example i'm searching with single keyword "bumper" and i have columns of items in which there are items like "front bumper" , "back bumper" , "small bumper". it should return these results but it shows 0 result. SELECT *, ( CASE WHEN oem_part_no IN ('bumper') THEN 1 ELSE 0 END + CASE WHEN brand IN ('bumper') THEN 1 ELSE 0 END + CASE WHEN item IN ('bumper') THEN 1 ELSE 0 END + CASE WHEN vehicle IN ('bumper') THEN 1 ELSE 0 END + CASE WHEN make IN ('bumper') THEN 1 ELSE 0 END ) AS rank FROM items having rank >0 ORDER BY rank DESC
@kabir if you're doing fuzzy search that's a little different; the original question/answer dealt with exact value-to-value matches. You'd want something like CASE WHEN oem_part_no LIKE '%bumper%' THEN 1 ELSE 0 END. If you've got multiple search terms it'll get more complicated still.
@kabir that's true, as I mentioned in my comment. Sounds like you'll need something a bit more complicated -- how about using CASE WHEN field1 = 'bumper' THEN 2 ELSE 0 END + CASE WHEN field1 LIKE '%bumper%' THEN 1 ELSE 0 END for each field you're searching against? That'd make exact matches have double the ranking of fuzzy matches. Then duplicate those two CASEs for each keyword, and then duplicate the whole thing for each field you're searching against. At this stage the query is getting quite verbose, but you'll get a decent relevance search.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.