Removing Rows with Duplicate Values Across Multiple Columns

Question

I have query like below:

SELECT
    c.id, 
c.user,
c1.user,
c2.user
FROM (
    SELECT 
    id,
    user
    FROM 
        table_x
) c
INNER JOIN table_x c1 ON c.id = c1_id AND c.user = 'steve'
INNER JOIN table_x c2 ON c.id = c2.id AND c1.user = 'rob'
INNER JOIN table_x c3 ON c.id = c3.id AND c2.user LIKE 'r%'
GROUP BY c.id, c.user, c1.user, c2.user

And it can produce a result set like:

id | user | user | user
1    steve  rob    rob52
1    steve  rob    rob

I need the result set to not include the second row where the user across two columns is not unique. Is there a way to check for this without using a where clause to check every individual combination of columns?, because when the result set spans to something like 6 columns, it would be just too much to check for.

It is also possible for a result set to come back as:

id | user | user | user
1    rob    steve  rob

So a comparison with the <> or != operator at the time of join would not catch a row like this but could row 2 in the above result set.

Thanks

That row is not, per say, duplicate, but it is an issue. You could do a Top 1, but why would the first returned have preference over the other? Are you trying to avoid users with numbers in them? — Elias
– Elias, Commented Oct 16, 2013 at 19:21
The first row does not have preference, I just wrote it that way, they could easily have been flipped. I'm not trying to avoid users with numbers in them, I'm trying to avoid duplicate names being joined together like in row 2 in the above set. The 2nd and 3rd user in the second row are both rob, which is the same user and that row needs to not exist in the result set after the join. — Daniel
– Daniel, Commented Oct 16, 2013 at 22:01
I have a hard time understanding what the objective of that statement is. Maybe, if you described your underlying problem we could find a better way writing this query. (Btw: the derived table c is totally useless, it can be replaced with from table_x as c) — user330315
– user330315, Commented Aug 17, 2014 at 11:23

Jeroen · Accepted Answer · 2014-01-01 16:44:17Z

1

You could use distinct on

SELECT DISTINCT ON (c.id, c.user, c1.user)
    c.id, 
c.user,
c1.user,
c2.user
FROM (
    SELECT 
    id,
    user
    FROM 
        table_x
) c
INNER JOIN table_x c1 ON c.id = c1_id AND c.user = 'steve'
INNER JOIN table_x c2 ON c.id = c2.id AND c1.user = 'rob'
INNER JOIN table_x c3 ON c.id = c3.id AND c2.user LIKE 'r%'
GROUP BY c.id, c.user, c1.user, c2.user

This way you will only get 1 record for each distinct combination of the columns mentioned in the distinct on clause

answered Jan 1, 2014 at 16:44

Jeroen

3274 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Daniel Over a year ago

Sorry this doesn't answer the question. I need all of the columns in a row to be distinct. The above only makes sure that each row is distinct from any other rows in a set.

Daniel Over a year ago

Look at the initial question. The row that says steve | rob | rob has c1.user and c2.user that are equal in the same row. The row would need to be excluded from the result set.

Chris Travers · Accepted Answer · 2014-08-17 10:55:13Z

0

what's wrong with just:

WHERE c.user <> c1.user AND  c1.user <> c2.user AND c2.user <> c.user

that seems to do exactly what you are looking to do. I think you are overthinking your problem.....

answered Aug 17, 2014 at 10:55

Chris Travers

26.6k6 gold badges70 silver badges191 bronze badges

Collectives™ on Stack Overflow

Removing Rows with Duplicate Values Across Multiple Columns

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related