1

These are the only two tables that are germane. No need to bother you with the others.

mysql> describe skill_usage;
+----------+---------+------+-----+---------+-------+
| Field    | Type    | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| skill_id | int(11) | NO   | MUL | NULL    |       |
| job_id   | int(11) | NO   | MUL | NULL    |       |
+----------+---------+------+-----+---------+-------+

mysql> describe skill_names;
+------------+----------+------+-----+---------+----------------+
| Field      | Type     | Null | Key | Default | Extra          |
+------------+----------+------+-----+---------+----------------+
| skill_id   | int(11)  | NO   | PRI | NULL    | auto_increment |
| skill_name | char(32) | NO   | MUL | NULL    |                |
+------------+----------+------+-----+---------+----------------+

Basically, users enter a boolean search string, using skill names.

I will convert the skill snames to skill_id, and then want to generate a MySql query to get all matching job_id from table skill_usage, by parsing the user's search string.

Strings can contain skill name, the operators AND and OR, plus brackets for precedence.

Some examples might be

  • C
  • C or C++
  • C++ AND UML
  • (C AND kernel) OR (C++ AND UML)

But there is no limit to the complexity of expression allowed - and that's my problem.

I'm no SQL guru, so correct me if I am wrong. I think that I want to start SELECT job_id FROM skill_usage then parse, and build up the rest of the query.

For the first example, just skill name C, I want to add WHERE skillId = X, where X is gotten from the table skill_names.

For a simple OR, like C OR C++, I can use an IN clause - WHERE skillId IN (X, Y) (again, X & Y are lookups of the skill names to get a skill_id).

For a simple AND, like C++ AND UML, I reckon I need an INNER JOIN, something like WHERE skill_id = X INNER JOIN skill_usage ON skill_usage.skill_id = Y (where X is the skill_id for C++ and Y for UML).

I think that is roughly correct, for those simple cases (?).

But, when I get to even slightly more complex cases like (C AND kernel) OR (C++ AND UML), I get confused.

Would a regex or an algorithm be appropriate here?

@AnthonyVallée-Dubois answer to this question looks like I might be able to modify it, but it also seems very complex. I am hoping to make something simpler, but am unsure how to start (the PHP coding is not my problem, just the regex or algorithm).

Update

I am trying to separate the parsing from the queries, and am using this question to sort out the queries.

I am getting answers like

SELECT job_id
FROM skill_usage
WHERE skill_id IN (3, 4)
GROUP BY job_id
HAVING MIN(skill_id) <> MAX(skill_id);

and

select s1.job_id
  from skill_usage s1
  where s1.skill_id = 3
    and s1.job_id in (
                       select s2.job_id
                         from skill_usage s2
                        where s2.skill_id = 4
                     )

Where the latter looks more extensible.

Whereas my pseudo-code for the PHP to convert a search string to an SQL query is roughly

fail if mis-matched brackets

reduce multiple spaces to single
removes spaces before and after closing/opening bracket  "( " & " )"

foreach c in string

   if c == (
   
   else
      if c === )
      
      else
         if AND
         
         else
           if OE
           
           else
              # it's a skill name
12
  • 1
    Why look up skill_name to get skill_id? Just use skill_name since you already have it Commented Aug 14, 2020 at 18:30
  • 1
    Also, I wouldn't let user enter that string. Use dropdowns or let them add to a multiple select and construct it yourself from that. In this way you could present them with skill_name associate skill_id in the form and use that if the query is easier. Commented Aug 14, 2020 at 18:33
  • 1
    lekkerlogic.com/2016/02/… Commented Aug 15, 2020 at 10:50
  • 1
    Here is what you need: stackoverflow.com/questions/9130284/… Commented Aug 15, 2020 at 10:54
  • 1
    Hint: dont JOIN, just emit AND EXISTS(...from skill_usage WHERE ...) for every skills-tag you encounter in the source string. ( ) OR AND NOT can be emitted as is. Commented Aug 16, 2020 at 12:27

1 Answer 1

2
+200

Simple query generator, assuming PDO


        ## for simple tokenisation, the terms are separated by space here.
        ## ###############################################################
$string = "( C AND kernel ) OR ( C++ AND UML )";

function emit_term( $tag ) {
$res = " EXISTS (
                SELECT *
                FROM skill_usage su
                JOIN skill_names sn ON sn.skill_id = su.skill_id
                WHERE su.Job_id = j.job_id
                AND sn.skillname = :" . $tag . ")\n";
return $res;
}


$fixed_part ="
SELECT job_id, job_name
 FROM jobs j
 WHERE 1=1
 AND \n" ;


# $tokens = explode( ' ' , $string ); #splits on any single space
$tokens = preg_split( '/[\s]+/' , $string ); # accepts multiple whitespace
# print_r ( $tokens );

$query = $fixed_part;

$args = array();
$num = 1;
foreach ( $tokens as $tok ) {
        switch ($tok) {
        case '':  # skip empty tokens
        case ';':  # No, you should not!
        case '"':
        case "'":
        case ';':  break;
        case '(': $query .= '('; break;
        case ')': $query .= ')'; break;
        case '&':
        case 'AND': $query .= ' AND '; break;
        case '|':
        case 'OR': $query .= ' OR '; break;
        case '!':
        case 'NOT': $query .= ' NOT '; break;
        default:
                $tag = '_q' . $num ;
                $query .= emit_term ( $tag );
                $args[$tag] = $tok;
                $num += 1;
                 break;
                }
        }
$query .= ";\n\n";

echo "Query + parameters (for PDO):\n" ;
echo $query;
print_r ( $args) ;
          

Output:


SELECT job_id, job_name
 FROM jobs j
 WHERE 1=1
 AND 
( EXISTS (
        SELECT *
        FROM skill_usage su
        JOIN skill_names sn ON sn.skill_id = su.skill_id
        WHERE su.Job_id = j.job_id
        AND sn.skillname = :_q1)
 AND  EXISTS (
        SELECT *
        FROM skill_usage su
        JOIN skill_names sn ON sn.skill_id = su.skill_id
        WHERE su.Job_id = j.job_id
        AND sn.skillname = :_q2)
) OR ( EXISTS (
        SELECT *
        FROM skill_usage su
        JOIN skill_names sn ON sn.skill_id = su.skill_id
        WHERE su.Job_id = j.job_id
        AND sn.skillname = :_q3)
 AND  EXISTS (
        SELECT *
        FROM skill_usage su
        JOIN skill_names sn ON sn.skill_id = su.skill_id
        WHERE su.Job_id = j.job_id
        AND sn.skillname = :_q4)
);

Array
(
    [_q1] => C
    [_q2] => kernel
    [_q3] => C++
    [_q4] => UML
)

                     
Sign up to request clarification or add additional context in comments.

4 Comments

It looks great, but it is failing on "( C OR C++ OR UML )". I you can make it handle multiple AND / OR, I will award a few 100 bonus. Thanks!
Just remove the multiple spaces.
... or skip the empty tokens when generating the query. (see update)
D'oh! Thanks. Much better than the solution that I was working towards. I will award you the 200 bounty as soon as the system permits it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.