sql query to find all linked rows

Question

I have a table "users":

drop table users;
CREATE TABLE users(
id int,
user_id int,
phone_number VARCHAR(30),
email VARCHAR(30));

INSERT INTO users
VALUES
(1, 999, 61412308310, '[email protected] '),
(2, 129, 61477708777, '[email protected] '),
(3, 213, 61488908495, '[email protected]'),
(4, 145, 61477708777, '[email protected]'),
(5, 214, 61421445777, '[email protected]'),
(6, 214, 61421445326, '[email protected]');

I want to select all rows that have duplicate user_id or duplicate phone_number or duplicate email.

result should be:

2, 129, 61477708777, '[email protected] '
4, 145, 61477708777, '[email protected]'
5, 214, 61421445777, '[email protected]'
6, 214, 61421445326, '[email protected]'

id = 2 and id = 4 match the search terms (phone_number = 61477708777). id = 5 has the same email with row id 4, id=6 has the same user_id with id=5.

"It takes any value in columns" Who or what is "it"? What exactly is your question and where is your code? — Jonas Metzler
– Jonas Metzler, Commented Jan 12, 2023 at 12:24
if I do a selection by the phone column and the value 61477708777 then the first 2 rows will be: 2, 129, 61477708777, '[email protected] ' 4, 145, 61477708777, '[email protected]' select key , id , phone , email from us where phone = '61477708777' since id= 4 has the same email with id= 5, you need to add line with id 5 to it as well since id= 5 has the same user_id with id= 6, you need to add line 6 in result query as well. — wdad asd
– wdad asd, Commented Jan 12, 2023 at 12:45
the choice of values can be made from any columns (id , user_id , phone_number , email) — wdad asd
– wdad asd, Commented Jan 12, 2023 at 12:47
Just to check, you want to recurse as far as possible? So, you'd get the whole of the following chain? (not based on your sample data) id1 shares an email with id2 who shares phone number with id3 who shares email with id4 who shares userid with id5 who shares phone number with id6, etc, etc, etc? — MatBailie
– MatBailie, Commented Jan 12, 2023 at 12:50
@ MatBailie Yes. But, for example, if I need to select all related records by user_id = 999, then as a result I should get only 1 row - (1, 999, 61412308310, '[email protected] '). since there are no matching phone numbers or emails. — wdad asd
– wdad asd, Commented Jan 12, 2023 at 12:53

Tomáš Záluský · Accepted Answer · 2023-01-12 13:29:10Z

2

Recursive query is what you need. It helps you express declaratively the reasoning of adding another rows for given seed row:

with recursive r (id, user_id, phone_number, email) as (
  select u.id, u.user_id, u.phone_number, u.email
  from users u
  where u.phone_number = 61477708777 -- or any initial condition
  union
  select u.id, u.user_id, u.phone_number, u.email
  from r
  join users u on (
    r.email = u.email
    or r.user_id = u.user_id
    --or add whatever condition
  )
)
select * from r

fiddle

edited Jan 12, 2023 at 13:29

answered Jan 12, 2023 at 13:14

Tomáš Záluský

12.3k4 gold badges44 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

MatBailie Over a year ago

ROTFL I honestly did not realise that using UNION removed 'duplicates' before the next recursion. Much simpler than what I was in the middle of coding to track previously visited nodes (to protect against cyclic graphs) Proof this works even if cycles are present: dbfiddle.uk/aG6lTc3i

wdad asd Over a year ago

at the end you need select * from r

Tomáš Záluský Over a year ago

@wdadasd thanks, it was mistakenly deleted when making concise the automatic fiddle markdown output

Tomáš Záluský Over a year ago

@MatBailie yes, sometimes eliminating duplicates in recursive CTE annoys me (not allowing distinct in recursive part, window function hacks, array hacks etc.) Fortunately it seems it was not OP's case. :-)

MatBailie Over a year ago

Array hacks to track previously visted nodes was where I was going OOOPS :)

|

Murmulodi · Accepted Answer · 2023-01-12 13:24:00Z

0

Problem with joining the table with itself multiple times using different conditions.

try:

SELECT a.*
FROM users a
JOIN (SELECT user_id, COUNT(*)
FROM users 
GROUP BY user_id
HAVING count(*) > 1 ) b
ON a.user_id = b.user_id
union
SELECT a.*
FROM users a
JOIN (SELECT email, COUNT(*)
FROM users 
GROUP BY email
HAVING count(*) > 1 ) b
ON a.email = b.email

This query first filters the rows based on the input user_id, then it checks for any matching email addresses. You can add union for the phone_number as well.

fiddle

edited Jan 12, 2023 at 13:24

answered Jan 12, 2023 at 12:56

Murmulodi

7391 gold badge7 silver badges23 bronze badges

7 Comments

MatBailie Over a year ago

That only recurses Once. The op described at least two levels of recursion, and may require unlimited recursion (find the whole connected 'web' of users where at least one node has phone_number = '61477708777')

Murmulodi Over a year ago

@MatBailie what u mean with recurses once. The SO wants to select all rows that have duplicate user_id or duplicate phone_number or duplicate email.

MatBailie Over a year ago

It's not searching for duplicates, it's searching for associations. WHERE phone_number = '61477708777' gets id's 2 and 4. id 4 shares a email with id 5 so that's added to the result. id 5 shares a userid with id 6, so that's added so the result. So, that initial where clause yields users [2,4,5,6]

Murmulodi Over a year ago

@MatBailie If you select all rows where phone_number = '61477708777' is the same as you select all rows they have same phone_number, that means they are duplicates.

MatBailie Over a year ago

Just look at this answer and figure out what it does differently than yours.

|

Collectives™ on Stack Overflow

sql query to find all linked rows

2 Answers 2

7 Comments

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related