1

Say I have the following table, named data:

ID   foo1     foo2    foo3
1    11       22      33
2    22       17      92
3    31       33      53
4    53       22      11
5    43       23      9

I want to select all rows where either foo1, foo2 or foo3 match either of these columns in the first row. That is, I want all rows where at least one of the foos appears also in the first row. In the example above, I want to select rows 1, 2, 3 and 4. I thought that I could use something like

SELECT * FROM data WHERE foo1 IN (SELECT foo1,foo2,foo3 FROM data WHERE ID=1)
                      OR foo2 IN (SELECT foo1,foo2,foo3 FROM data WHERE ID=1)
                      OR foo3 IN (SELECT foo1,foo2,foo3 FROM data WHERE ID=1)

but this does not seem to work. I can, of course, use

WHERE foo1=(SELECT foo1 FROM data WHERE ID=1) 
   OR foo1=(SELECT foo2 FROM data WHERE ID=1) 
   OR ...

but that would invlove many lines, and in my real data set there are actually 16 columns, so it will really be a pain in the lower back. Is there a more sophisticated way to do so?

Also, what should I do if I want to count also the number of hits (in the example above, get 4 for row 1, 2 for row 4, and 1 for rows 2,3)?

2
  • 1
    Why only get 1 as the number of hits for row 1, and not 3? Commented Dec 18, 2012 at 23:42
  • @eggyal - you are correct, of course. I simply am not too interested in the result for row 1 and it will be discarded at the end any how. I edited the post. Commented Dec 19, 2012 at 7:31

2 Answers 2

3
SELECT data.*,
      (data.foo1 IN (t.foo1, t.foo2, t.foo3))
    + (data.foo2 IN (t.foo1, t.foo2, t.foo3))
    + (data.foo3 IN (t.foo1, t.foo2, t.foo3)) AS number_of_hits
FROM   data JOIN data t ON t.id = 1
WHERE  data.foo1 IN (t.foo1, t.foo2, t.foo3)
    OR data.foo2 IN (t.foo1, t.foo2, t.foo3)
    OR data.foo3 IN (t.foo1, t.foo2, t.foo3)

See it on sqlfiddle.

Actually, on reflection, you might consider normalising your data:

CREATE TABLE data_new (
  ID         BIGINT  UNSIGNED NOT NULL,
  foo_number TINYINT UNSIGNED NOT NULL,
  val        INT,
  PRIMARY KEY (ID, foo_number),
  INDEX (val)
);

INSERT INTO data_new
  (ID, foo_number, val)
          SELECT ID, 1, foo1 FROM data
UNION ALL SELECT ID, 2, foo2 FROM data
UNION ALL SELECT ID, 3, foo3 FROM data;

DROP TABLE data;

Then you can do:

SELECT   ID,
         MAX(IF(foo_number=1,val,NULL)) AS foo1,
         MAX(IF(foo_number=2,val,NULL)) AS foo2,
         MAX(IF(foo_number=3,val,NULL)) AS foo3,
         number_of_hits
FROM     data_new JOIN (
  SELECT   d1.ID, COUNT(*) AS number_of_hits
  FROM     data_new d1 JOIN data_new d2 USING (val)
  WHERE    d2.ID = 1
  GROUP BY d1.ID
) t USING (ID)
GROUP BY ID

See it on sqlfiddle.

As you can see from the execution plan, this will be considerably more efficient for large data sets.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the answer. Could you please explain what does MAX do in MAX(IF(foo_number=...,val,NULL)). It only accepts one argument, dosen't it?
@yohai: Because of the GROUP BY operation, each group contains multiple records - but only one value can appear for each group in the resultset. MySQL's MAX() aggregation function chooses the record with the maximum value; if there's only one non-NULL record - and there will be owing to the PK on (ID, foo_number) - then it is simply going to select that record.
Without such an aggregation function, MySQL is free to select the value from any record in the group (and it will do so indeterminately), so it could select the value from another foo_number - which will be NULL due to the IF().
1

There are several ways to get the result set.

Here's one approach, (if you don't care about which fooN gets matched with with fooN, and also want to return that "first" row).

SELECT DISTINCT d.* 
  JOIN ( SELECT foo1 AS foo FROM data WHERE id = 1
          UNION ALL
         SELECT foo2 FROM data WHERE id = 1
          UNION ALL
         SELECT foo3 FROM data WHERE id = 1
       ) f
  JOIN data d
    ON  f.foo IN (d.foo1, d.foo2, d.foo3)

That ON clause could also be written like this:

    ON d.foo1 = f.foo
    OR d.foo2 = f.foo
    OR d.foo2 = f.foo

To get a "count" of the hits...

SELECT d.id
     , d.foo1
     , d.foo2
     , d.foo3
     , SUM( IFNULL(d.foo1=f.foo,0)
           +IFNULL(d.foo2=f.foo,0)
           +IFNULL(d.foo3=f.foo,0)
       ) AS count_of_hits
  JOIN ( SELECT foo1 AS foo FROM data WHERE id = 1
          UNION ALL
         SELECT foo2 FROM data WHERE id = 1
          UNION ALL
         SELECT foo3 FROM data WHERE id = 1
       ) f
  JOIN data d
    ON  f.foo IN (d.foo1, d.foo2, d.foo3)
 GROUP
    BY d.id
     , d.foo1
     , d.foo2
     , d.foo3


eggyal is right, as usual. Getting the count of hits is actually much simpler: we can just use a SUM(1) or COUNT(1) aggregate, no need to run all those comparisons, we've already done all the necessary comparisons.

SELECT d.id
     , d.foo1
     , d.foo2
     , d.foo3
     , COUNT(1) AS count_of_hits
  JOIN ( SELECT foo1 AS foo FROM data WHERE id = 1
          UNION ALL
         SELECT foo2 FROM data WHERE id = 1
          UNION ALL
         SELECT foo3 FROM data WHERE id = 1
       ) f
  JOIN data d
    ON  f.foo IN (d.foo1, d.foo2, d.foo3)
 GROUP
    BY d.id
     , d.foo1
     , d.foo2
     , d.foo3

4 Comments

+1 I do like the elegance of this approach (especially since one can get the number of hits by grouping on d.id and counting group members instead of the DISTINCT filter); but won't it be quite unfriendly to indexes on the foo columns (can indexes be used in evaluating the right-side of an IN expression)?
@eggyal, good question. I believe the optimizer sees that IN as being identical to equality comparisons OR'd together (as I showed in the rewritten ON clause.
@eggyal: good catch, getting the count of hits is fairly straightforward. i had a bunch of comparisons, but those aren't needed (as you suggested), we can just use a SUM(1) or COUNT(1) aggregate.
Aye. COUNT(*) can often be better optimised than SUM(1) or COUNT(1).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.