0

Given a table

CREATE TABLE data(
 irs_number VARCHAR (50),
 mop_up INTEGER,
 ou VARCHAR (50)
);

How would I return all matching records that...

  • have at least one identical value for irs_number in another row AND
  • at least one mop_up of those with the same irs_number must be set to 1 AND
  • the ou values are not identical, i.e. only return those not matching to a row with the identical irs_number.

... so that all irs_numbers would be returned (not only one where the conditions are true - see example below).

I tried this but the query cannot finish within reasonable time:

SELECT irs_number, mop_up, ou
FROM data outer_data
WHERE (SELECT count(*)
FROM data inner_data
WHERE inner_data.irs_number = outer_data.irs_number
AND inner_data.mop_up = 1 OR outer_data.mop_up = 1
AND inner_data.ou <> outer_data.ou
);

As well as variations of duplicate counts as described here: How to find duplicate records in PostgreSQL - they will always just return the duplicates but not the proper filter applied.


edit:

Example data:

INSERT INTO data VALUES 
('0001', 1, 'abc'),
('0001', 0, 'abc'),
('0001', 0, 'cde'),
('0001', 0, 'abc'),
('0002', 1, 'abc'),
('0002', 0, 'abc'),
('0003', 0, 'abc'),
('0003', 0, 'xyz')
;

SQLFiddle: http://sqlfiddle.com/#!17/be28f

a query should ideally return:

irs_number  mop_up  ou
-----------------------
0001        1       abc
0001        0       abc
0001        0       cde
0001        0       abc

(order not important) meaning it should return all rows matching having the irs_number with the condition above.

2 Answers 2

1

You should be able to do this with a simple exists clause:

SELECT irs_number, mop_up, ou
FROM data d
WHERE EXISTS (SELECT 1
              FROM data d2
              WHERE d2.irs_number = d.irs_number AND
                    d2.mop_up = 1 AND
                    d2.ou <> d.ou
             );

EDIT:

The above misinterpreted the question. It assumed that a mop_up = 1 needed to be on a different ou. As I read the question, this is ambiguous but doesn't appear to be what you want. So, two exists address this:

SELECT irs_number, mop_up, ou
FROM data d
WHERE EXISTS (SELECT 1
              FROM data d2
              WHERE d2.irs_number = d.irs_number AND
                    d2.mop_up = 1
             ) AND
     EXISTS (SELECT 1
              FROM data d2
              WHERE d2.irs_number = d.irs_number AND
                    d2.ou <> d.ou
             );

Here is a db<>fiddle.

Both these solutions will be able to take advantage of an index on (irs_number, mop_up, ou).

Sign up to request clarification or add additional context in comments.

6 Comments

This has nothing to do with the requirements of the question.
this would return only those with mop_up set to 1 but not all within where at least on mop_up was set to 1
@dh762 . . . Not at all. This would return all irs_numbers that have a corresponding row with mop_up = 1 -- which is what you are asking for. I have no idea why you are confusing the where clause in the correlated subquery with what the outer query returns.
correct, but this would only serve as a subquery because the final query should return all records with that subqueried irs_number - see example - your query does only return 1 row with 0001 instead of all 4 with 0001. It might not be obvious from the pre-edit question (will update)
@dh762 . . . I see. I interpreted the second and third bullet points differently from what you intended. I've adjusted the answer.
|
1

I think this join will do:

SELECT * FROM data 
WHERE irs_number in (
  SELECT irs_number
  FROM data d
  WHERE EXISTS (SELECT 1
    FROM data 
    WHERE irs_number = d.irs_number
    AND (mop_up = 1 OR d.mop_up = 1)
    AND ou <> d.ou
  )
)

See the demo

7 Comments

it does return also rows that have just a single irs_number @forpas
This joins rows with the same irs_number and different ou, meaning different rows with the same irs_number. Can you post sample data and expected results so to be clear?
correction to the comment: it does return one row only if there are duplicate irs_numbers within the same ou (at least one has mop_up = true). happy to add sample data
Why should 0001 0 abc be returned? There is not a row with the same irs_number and different ou with mop_up = 1 to any of the 2 rows.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.