mysql select for simple join for rows having certain cells with duplicate values

Question

I am trying to do a MySQL SELECT with two tables that results in showing columns from each of the tables only when there are multiple rows with two of the columns have duplicated values.

The following SELECT statement gives customer_id, order_id, product_ids and skus for all orders placed by a customer...

select orders2.customer_id, items2.order_id, items2.product_id, items2.sku 
from orders orders2, order_items items2
where orders2.status = 'complete' 
    and orders2.customer_id is not null
    and orders2.entity_id = items2.order_id
order by orders2.customer_id ASC, items2.product_id ASC;

Giving me results of...

+-------------+----------+------------+--------------------------+
| customer_id | order_id | product_id | sku                      |
+-------------+----------+------------+--------------------------+
|       29813 |    38025 |        306 | BB_MAT101                |
|       29813 |    38027 |        309 | BB_MAT250                |
|       29814 |    28844 |        302 | BB_ENG101                |
|       29814 |    27615 |        384 | BB_MS-ACC101             |
|       29814 |    27616 |        385 | BB_MS-ACC102             |
|       29814 |    27615 |        385 | BB_MS-ACC102             |
|       29814 |    27614 |        409 | BB_MS-MAT101             |
|       29814 |    27584 |        410 | BB_MS-MAT150             |
|       29815 |    27592 |        384 | BB_MS-ACC101             |
|       29815 |    27593 |        384 | BB_MS-ACC101             |
|       29815 |    27594 |        384 | BB_MS-ACC101             |
|       29815 |    27599 |        385 | BB_MS-ACC102             |
|       29815 |    27592 |        402 | BB_MS-ECON101            |
|       29815 |    27593 |        402 | BB_MS-ECON101            |
|       29815 |    27594 |        402 | BB_MS-ECON101            |
|       29815 |    27596 |        403 | BB_MS-ECON102            |
|       29815 |    27598 |        404 | BB_MS-ENG099             |
|       29815 |    27588 |        405 | BB_MS-ENG101             |
|       29815 |    27595 |        406 | BB_MS-ENG102             |
|       29815 |    27589 |        408 | BB_MS-MAT099             |
|       29815 |    27585 |        409 | BB_MS-MAT101             |
|       29815 |    27589 |        410 | BB_MS-MAT150             |
|       29815 |    27589 |        411 | BB_MS-MAT201             |
+-------------+----------+------------+--------------------------+

The following SELECT lets me determine those cases where a customer has ordered a product more than once...

select orders1.customer_id as dupe_customer_id, items1.product_id as dupe_product_id, count(*) as duplicates
from orders orders1, order_items items1
where orders1.status = 'complete' 
    and orders1.customer_id is not null
    and orders1.entity_id = items1.order_id
group by orders1.customer_id, items1.product_id
having duplicates > 1;

With results of...

+------------------+-----------------+------------+
| dupe_customer_id | dupe_product_id | duplicates |
+------------------+-----------------+------------+
|            29814 |             385 |          2 |
|            29815 |             384 |          3 |
|            29815 |             402 |          3 |
+------------------+-----------------+------------+

What I am trying to puzzle through is how to combine these so that I only get those items in the FIRST select where they meet the conditions of the second select, such that the output would be something like...

+-------------+----------+------------+--------------------------+
| customer_id | order_id | product_id | sku                      |
+-------------+----------+------------+--------------------------+
|       29814 |    27616 |        385 | BB_MS-ACC102             |
|       29814 |    27615 |        385 | BB_MS-ACC102             |
|       29815 |    27592 |        384 | BB_MS-ACC101             |
|       29815 |    27593 |        384 | BB_MS-ACC101             |
|       29815 |    27594 |        384 | BB_MS-ACC101             |
|       29815 |    27592 |        402 | BB_MS-ECON101            |
|       29815 |    27593 |        402 | BB_MS-ECON101            |
|       29815 |    27594 |        402 | BB_MS-ECON101            |
+-------------+----------+------------+--------------------------+

Note that the main criteria of the result set is that any combination of customer_id AND product_id MUST occur more than once.

I'm struggling with how to combine them. My attempts have resulted in one (or more) of the columns have incorrect data repeated across every row in the results.

I've hit the wall with my MySQL knowledge on doing this type of SELECT and searches over the past hour have not yielded anything.

Can you not use the 2nd query as a subselect in the join clause of the first query? select a.col1, b.col2 ... from table a ... join (select ... ) as b ... works in MySQL, as far as I know. Probably quite inefficient, but once you've got a working query, you can work on indexes, if needed. — Paul Hicks
– Paul Hicks, Commented Feb 18, 2014 at 20:44
it's kind of a bit complicated understanding that kind of queries from a comment, why don't you write it down? — Félix Adriyel Gagnon-Grenier
– Félix Adriyel Gagnon-Grenier, Commented Feb 18, 2014 at 20:45

PlantTheIdea · Accepted Answer · 2014-02-18 20:57:58Z

1

Try giving something like this a try:

SELECT orders.customer_id
    ,items.order_id
    ,items.product_id
    ,items.sku 
FROM orders AS orders
INNER JOIN order_items AS items ON orders.entity_id = items.order_id
WHERE (orders.customer_id,items.product_id) IN (
    SELECT orderSub.customer_id
        ,itemSub.product_id
    FROM orders AS orderSub
    INNER JOIN order_items AS itemSub ON orderSub.entity_id = itemSub.order_id
    WHERE orderSub.status = 'complete' 
        AND orderSub.customer_id IS NOT NULL
    GROUP BY orderSub.customer_id
        ,itemSub.product_id
    HAVING COUNT(*) > 1;
);

It is providing the query as a subquery for a multi-column match for the IN clause to work with. This is off the top of my head as I'm not sure if MySQL supports this syntax, but it works in Teradata and DB2.

I also converted your query to maximize ANSI compliance for performance reasons, so if it looks different other than the WHERE IN clause that is why.

edited Feb 18, 2014 at 20:57

answered Feb 18, 2014 at 20:52

PlantTheIdea

16.4k5 gold badges37 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Rob Emenecker Over a year ago

This was a great push in the right direction, thanks PlantTheldea. I found that I needed to create a temporary table for the innermost join, otherwise the query ran for hours with no end in sight.

Rob Emenecker · Accepted Answer · 2014-02-23 17:33:14Z

Here is what I eventually came up with, with the help of PlantTheldea's solution. My solution has additional left joins in place to provide the full report that I was looking for. (They were not relevant to the original question. However, it is easier to just include them here, rather than try to edit them out.)

CREATE TEMPORARY TABLE temp_customer_items 
    (PRIMARY KEY my_pkey (customer_id, product_id)) 
SELECT orderSub.customer_id 
    , itemSub.product_id 
FROM orders AS orderSub
INNER JOIN items AS itemSub 
    ON orderSub.entity_id = itemSub.order_id 
WHERE orderSub.entity_id = itemSub.order_id 
    AND orderSub.customer_id IS NOT NULL 
    AND orderSub.status = 'complete'  
    AND itemSub.product_type = 'simple' 
GROUP BY orderSub.customer_id 
    , itemSub.product_id 
HAVING COUNT(*) > 1;
SELECT orders.customer_id AS student_id
    , fname.value AS first_name 
    , mname.value AS middle_name 
    , lname.value AS last_name 
    , customers.email 
    , items.product_id 
    , items.sku AS product_sku
    , orders.increment_id AS order_id 
    , orders.updated_at AS order_date
FROM orders AS orders 
LEFT JOIN customers AS customers 
    ON orders.customer_id = customers.entity_id 
INNER JOIN items AS items  
    ON orders.entity_id = items.order_id 
    WHERE (orders.customer_id,items.product_id) IN ( 
        SELECT temp_customer_items.customer_id 
            , temp_customer_items.product_id 
        FROM temp_customer_items )
ORDER BY last_name ASC 
    , first_name ASC 
    , middle_name ASC 
    , orders.updated_at DESC 
    , items.order_id ASC;
DROP TEMPORARY TABLE temp_customer_items;

Collectives™ on Stack Overflow

mysql select for simple join for rows having certain cells with duplicate values

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related