1

I am trying to make a custom search query parser. The idea is that the user can write specific keywords to search by e.g. artist, color and style. For example if the user searches for:

style:Emboss some keywords color:#333333 artist:"Tom Hank" steel

The returned result in the backend would be:

array(
    "style"  => "Emboss",
    0        => "some",
    1        => "keywords"
    "color"  => "#333333",
    "artist" => "Tom Hank", // Note the word is not broken
    2        => "steel"
)

So far I have managed to do the oppersite - by building a query string from an array with no problem. However I have a problem with parsing a string to an array - mostly due to the fact that there's quotes.

What I've so far is

public function parseQuery($str) {
    $arr = array();

    $pairs = str_getcsv($str, ' '); // This bugs me

    foreach($pairs as $k => $v) {
        list($name, $value) = explode(":", $v, 2);

        if(!isset($value)) {
            $arr[] = $name;
        } else {
            $arr[$name] = $value;
        }
    }

    return $arr;
}

The problem relies on the str_getcsv function which breaks quoted words if there's no space between the first quote or after the last. It breaks it down like this

Array
(
    [0] => Some
    [1] => string
    [2] => with
    [3] => but:"some <--- This is the sinner
    [4] => string"
)

It works if there's spaces between the but: and "some string", however I do not wan't this.

My question how this could be solved by using less to no regular expression.

15
  • 1
    What is the reason you don't want to use regex? Commented Jun 27, 2013 at 0:03
  • I am rather confused, you want to do something like what google does on their search, is that it ? Like you can search for php "mysql" site:stackoverflow.com where MySQL would be the main necessary word and php side word or something like that ? Commented Jun 27, 2013 at 0:06
  • 1
    why don't you just introduce a delimiter like ; in your syntax ? example query style:Emboss some keywords;color:#333333;artist:"Tom Hank" .. Commented Jun 27, 2013 at 0:07
  • @nifr a delimiter (such a ;) wouldn't be intuitive for anyone except maybe a programmer. Also, the keywords are separate entities from each other unless the : gives it the type, and " allows for spacing. Commented Jun 27, 2013 at 0:09
  • 2
    @Humanoidism Is this something you once heard somebody tell you or from personal experience? Seriously, what you want just screams regex. And there is no way you are going to be hurt by any performance hit (if any it is waaaaaaay too small to notice). Commented Jun 27, 2013 at 0:11

1 Answer 1

3

Try this... it's quick and dirty procedural code, but does what you want. You'll have refactor it to make it maintainable.

<?php
$str = 'style:Emboss some keywords color:#333333 artist:"Tom Hank" steel';

$pos = 0;
$buffer = '';
$len = strlen($str);
$quote = false;
$key = '';
$arr = array();

while ($pos < $len) {
    switch ($str[$pos]) {
        case '"':
            $quote = !$quote;
            break;
        case ':':
            $key = $buffer;
            $buffer = '';
            break;
        case ' ':
            if ($quote) {
                $buffer .= $str[$pos];
            }
            elseif (!empty($key)) {
                $arr[$key] = $buffer;
                $key = '';
                $buffer = '';
            }
            else {
                $arr[] = $buffer;
                $buffer = '';
            }
            break;
        default:
            $buffer .= $str[$pos];
    }
    $pos++;
}
if (!empty($key)) {
    $arr[$key] = $buffer;
}
else {
    $arr[] = $buffer;
}

print_r($arr);
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.