1

I have a table "users":

drop table users;
CREATE TABLE users(
id int,
user_id int,
phone_number VARCHAR(30),
email VARCHAR(30));
INSERT INTO users
VALUES
(1, 999, 61412308310, '[email protected] '),
(2, 129, 61477708777, '[email protected] '),
(3, 213, 61488908495, '[email protected]'),
(4, 145, 61477708777, '[email protected]'),
(5, 214, 61421445777, '[email protected]'),
(6, 214, 61421445326, '[email protected]');

I want to select all rows that have duplicate user_id or duplicate phone_number or duplicate email.

result should be:

2, 129, 61477708777, '[email protected] '
4, 145, 61477708777, '[email protected]'
5, 214, 61421445777, '[email protected]'
6, 214, 61421445326, '[email protected]'

id = 2 and id = 4 match the search terms (phone_number = 61477708777). id = 5 has the same email with row id 4, id=6 has the same user_id with id=5.

5
  • "It takes any value in columns" Who or what is "it"? What exactly is your question and where is your code? Commented Jan 12, 2023 at 12:24
  • if I do a selection by the phone column and the value 61477708777 then the first 2 rows will be: 2, 129, 61477708777, '[email protected] ' 4, 145, 61477708777, '[email protected]' select key , id , phone , email from us where phone = '61477708777' since id= 4 has the same email with id= 5, you need to add line with id 5 to it as well since id= 5 has the same user_id with id= 6, you need to add line 6 in result query as well. Commented Jan 12, 2023 at 12:45
  • the choice of values ​​can be made from any columns (id , user_id , phone_number , email) Commented Jan 12, 2023 at 12:47
  • Just to check, you want to recurse as far as possible? So, you'd get the whole of the following chain? (not based on your sample data) id1 shares an email with id2 who shares phone number with id3 who shares email with id4 who shares userid with id5 who shares phone number with id6, etc, etc, etc? Commented Jan 12, 2023 at 12:50
  • @ MatBailie Yes. But, for example, if I need to select all related records by user_id = 999, then as a result I should get only 1 row - (1, 999, 61412308310, '[email protected] '). since there are no matching phone numbers or emails. Commented Jan 12, 2023 at 12:53

2 Answers 2

2

Recursive query is what you need. It helps you express declaratively the reasoning of adding another rows for given seed row:

with recursive r (id, user_id, phone_number, email) as (
  select u.id, u.user_id, u.phone_number, u.email
  from users u
  where u.phone_number = 61477708777 -- or any initial condition
  union
  select u.id, u.user_id, u.phone_number, u.email
  from r
  join users u on (
    r.email = u.email
    or r.user_id = u.user_id
    --or add whatever condition
  )
)
select * from r

fiddle

Sign up to request clarification or add additional context in comments.

7 Comments

ROTFL I honestly did not realise that using UNION removed 'duplicates' before the next recursion. Much simpler than what I was in the middle of coding to track previously visited nodes (to protect against cyclic graphs) Proof this works even if cycles are present: dbfiddle.uk/aG6lTc3i
at the end you need select * from r
@wdadasd thanks, it was mistakenly deleted when making concise the automatic fiddle markdown output
@MatBailie yes, sometimes eliminating duplicates in recursive CTE annoys me (not allowing distinct in recursive part, window function hacks, array hacks etc.) Fortunately it seems it was not OP's case. :-)
Array hacks to track previously visted nodes was where I was going OOOPS :)
|
0

Problem with joining the table with itself multiple times using different conditions.

try:

SELECT a.*
FROM users a
JOIN (SELECT user_id, COUNT(*)
FROM users 
GROUP BY user_id
HAVING count(*) > 1 ) b
ON a.user_id = b.user_id
union
SELECT a.*
FROM users a
JOIN (SELECT email, COUNT(*)
FROM users 
GROUP BY email
HAVING count(*) > 1 ) b
ON a.email = b.email

This query first filters the rows based on the input user_id, then it checks for any matching email addresses. You can add union for the phone_number as well.

fiddle

7 Comments

That only recurses Once. The op described at least two levels of recursion, and may require unlimited recursion (find the whole connected 'web' of users where at least one node has phone_number = '61477708777')
@MatBailie what u mean with recurses once. The SO wants to select all rows that have duplicate user_id or duplicate phone_number or duplicate email.
It's not searching for duplicates, it's searching for associations. WHERE phone_number = '61477708777' gets id's 2 and 4. id 4 shares a email with id 5 so that's added to the result. id 5 shares a userid with id 6, so that's added so the result. So, that initial where clause yields users [2,4,5,6]
@MatBailie If you select all rows where phone_number = '61477708777' is the same as you select all rows they have same phone_number, that means they are duplicates.
Just look at this answer and figure out what it does differently than yours.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.