8

In this Query, I have to list pair of players with their playerID and playerName who play for the exact same teams.If a player plays for 3 teams, the other has to play for exact same 3 teams. No less, no more. If two players currently do not play for any team, they should also be included. The query should return (playerID1, playername1, playerID2, playerName2) with no repetition such as if player 1 info comes before player 2, there should not be another tuple with player 2 info coming before player 1.

For example if player A plays for yankees and redsox, and player b plays for Yankees, Red Sox, and Dodgers I should not get them. They both have to play for Yankees, and Red Sox and no one else. Right now this query finds answer if players play for any same team.

Tables:
player(playerID: integer, playerName: string)
team(teamID: integer, teamName: string, sport: string)
plays(playerID: integer, teamID: integer)

Example data:
PLAYER    
playerID    playerName
1           Rondo
2           Allen
3           Pierce
4           Garnett
5           Perkins

TEAM      
teamID     teamName       sport
1          Celtics        Basketball
2          Lakers         Basketball
3          Patriots       Football
4          Red Sox        Baseball
5          Bulls          Basketball

PLAYS
playerID    TeamID
1           1
1           2
1           3
2           1
2           3
3           1
3           3

So I should get this as answer-

 2, Allen, 3, Pierce 
 4, Garnett, 5, Perkins

.

2, Allen, 3 Pierce is an snwer because both play for exclusively CELTICS and PATRIOTS 4, Garnett, 5, Perkins iss an answer because both players play for no teams which should be in output.

Right now the Query I have is

SELECT p1.PLAYERID, 
       f1.PLAYERNAME, 
       p2.PLAYERID, 
       f2.PLAYERNAME 
FROM   PLAYER f1, 
       PLAYER f2, 
       PLAYS p1 
       FULL OUTER JOIN PLAYS p2 
                    ON p1.PLAYERID < p2.PLAYERID 
                       AND p1.TEAMID = p2.TEAMID 
GROUP  BY p1.PLAYERID, 
          f1.PLAYERID, 
          p2.PLAYERID, 
          f2.PLAYERID 
HAVING Count(p1.PLAYERID) = Count(*) 
       AND Count(p2.PLAYERID) = Count(*) 
       AND p1.PLAYERID = f1.PLAYERID 
       AND p2.PLAYERID = f2.PLAYERID; 

I am not 100% sure but I think this finds players who play for the same team but I want to find out players who play for the exclusively all same TEAMS as explained above

I am stuck on how to approach it after this. Any hints on how to approach this problem. Thanks for your time.

5
  • I understand that you want to find 2 players playing on the exact same teams. Can you clarify what you meant with "no repetition"? Commented Jul 31, 2013 at 6:58
  • What DBMS are you using? Commented Jul 31, 2013 at 6:59
  • A player can play for multiple teams. I want players who play for same exact teams. Lets say player A plays for Bulls, Magics, Celtics and player B plays for bulls, magics, celtics, heat. They don't count because heat is not a team player A plays for. I want to find two distnct players who play for exactly same teams, nothing less nothing more. Also if two players currently don't play for any teams which is possible. They should be in the output. Another thing, let's say team A and team B match. The output should either be A, B or B, A and not both. If any more questions do let me know. Commented Jul 31, 2013 at 7:02
  • I am doing it on postgres using SQL. Commented Jul 31, 2013 at 7:02
  • Your joins are totally wrong. You should not mix implicit joins in the having clause with explicit JOINs in from clause (and if at all: the implicit join should be done in the WHERE clause Commented Nov 9, 2013 at 23:27

16 Answers 16

4

I believe this query will do what you want:

SELECT array_agg(players), player_teams
FROM (
  SELECT DISTINCT t1.t1player AS players, t1.player_teams
  FROM (
    SELECT
      p.playerid AS t1id,
      concat(p.playerid,':', p.playername, ' ') AS t1player,
      array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
    FROM player p
    LEFT JOIN plays pl ON p.playerid = pl.playerid
    GROUP BY p.playerid, p.playername
  ) t1
INNER JOIN (
  SELECT
    p.playerid AS t2id,
    array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
  FROM player p
  LEFT JOIN plays pl ON p.playerid = pl.playerid
  GROUP BY p.playerid, p.playername
) t2 ON t1.player_teams=t2.player_teams AND t1.t1id <> t2.t2id
) innerQuery
GROUP BY player_teams


Result:
PLAYERS               PLAYER_TEAMS
2:Allen,3:Pierce      1,3
4:Garnett,5:Perkins

It uses array_agg over the teamid for each player in plays to match players with the exact same team configuration. I Included a column with the teams for example, but that can be removed without affecting the results as long as it isn't removed from the group by clause.

SQL Fiddle example.Tested with Postgesql 9.2.4

EDIT: Fixed an error that duplicated rows.

Sign up to request clarification or add additional context in comments.

6 Comments

Is there a way to do this without the use of array_agg and concat as im not familiar with those two?
Then familiarize yourself with these two
@user2632133 Maybe, but why not use the tools provided by postgresql? If you are unfamiliar with the functions just read the manual :)
Because I want this to work outside on postgre also like just on paper. I dont want to rely on postgre.
@user2632133 I'm not sure how to do this in standard ANSI SQL, my guess would be some kind of cursor might work. This works for postgres though, and that is what the question was tagged as.
|
1

Seems that OP probably won't be interested anymore, but in case somebody else finds it useful, this is query in pure SQL that works (for me at least ;))

SELECT M.p1, pr1.playername, M.p2, pr2.playername FROM player pr1 
INNER JOIN player pr2 INNER JOIN
(
   SELECT plays1.player p1, plays2.player p2, plays1.team t1 FROM plays plays1 
   INNER JOIN plays plays2 
   ON (plays1.player < plays2.player AND plays1.team = plays2.team)
   GROUP BY plays1.player, plays2.player HAVING COUNT(*) = 
((SELECT COUNT(*) FROM plays plays3 WHERE plays3.player = plays1.player) + 
(SELECT COUNT(*) FROM plays plays4 WHERE plays4.player = plays2.player)) /2
) M ON pr1.playerID = M.p1 AND pr2.playerID = M.p2 
UNION ALL
SELECT M.pid, M.pname, N.pid2, N.pname2 FROM
(
(SELECT p.playerID pid, p.playerName pname, pl.team FROM player p
 LEFT JOIN plays pl ON p.playerId = pl.player WHERE pl.team IS NULL) M
 INNER JOIN
 (SELECT p.playerID pid2, p.playerName pname2, pl.team FROM player p
  LEFT JOIN plays pl ON p.playerId = pl.player WHERE pl.team IS NULL) N 
 ON (pid < pid2)
)

Comments

1

its not any big deal, here is solution

with gigo as(select a.playerid as playerid,count(b.teamname) as nteams from player a 
full outer join plays c on a.playerid=c.playerid full outer join team b 
on b.teamid=c.teamid group by a.playerid)
select array_agg(a.*),g.nteams from player a inner join gigo g on a.playerid=g.playerid 
group by g.nteams having count(a.*)>1 order by g.nteams desc

Comments

1

This solution works for me :

SELECT TMP1. PLAYERID,TMP2.PLAYERID FROM
(
    SELECT a.playerid , a.teamid,b.team_sum 
    FROM plays  A
    INNER JOIN 
    (
        SELECT PLAYERID,SUM(teamid) AS team_sum
        FROM plays
        GROUP BY 1
     ) B

    ON a.playerid=b.playerid
 ) TMP1

INNER JOIN

(
    SELECT a.playerid , a.teamid,b.team_sum
    FROM plays  A

    INNER JOIN 
    (
        SELECT PLAYERID,SUM(teamid) AS team_sum
        FROM plays
        GROUP BY 1
    ) B

ON a.playerid=b.playerid

)TMP2
ON TMP1.PLAYERID < TMP2.PLAYERID
AND TMP1.TEAMID=TMP2.TEAMID
AND TMP1.TEAM_SUM=TMP2.TEAM_SUM
GROUP BY 1,2

UNION ALL
SELECT n1,n2 FROM  
(
    SELECT TMP3.PLAYERID AS n1,TMP4.PLAYERID AS n2 FROM 
    PLAYER  TMP3
    INNER JOIN PLAYER TMP4
    ON TMP3.PLAYERID<TMP4.PLAYERID
    WHERE TMP3.PLAYERID  NOT IN (SELECT  PLAYERID FROM plays  )
    AND tmp4.playerid NOT IN (SELECT playerid FROM plays)
) TMP5

Comments

0

Two possible solutions come to mind:

  1. Cursor - Looping through each player and comparing him to all the others until reaching a conclusion.
  2. Recursive query - Same idea though slightly more complicated but defiantly the better way to do it. Probably also has better performance.

Can you provide some sample data so that I can create an example?

1 Comment

Either use SQL fiddle (sqlfiddle.com) or just add a couple of row of data for each table to your original post.
0

It seems like the basic datatype you want is sets, rather than arrays. So one option may be to use PL/Python with code similar to that below (see bottom of this answer for a function that might be adapted to this end). Of course, this isn't a "pure SQL" approach by any means.

But sticking to PostgreSQL (albeit not standard SQL), you may also want to use DISTINCT with array_agg. Note that the following only gives the first pair that meets the criteria (in principle there could be many more).

WITH teams AS (
  SELECT playerID, array_agg(DISTINCT teamID ORDER BY teamID) AS teams
  FROM plays
  GROUP BY playerID),
teams_w_nulls AS (
  SELECT a.playerID, b.teams
  FROM player AS a
  LEFT JOIN teams AS b
  ON a.playerID=b.playerID),
player_sets AS (
  SELECT teams, array_agg(DISTINCT playerID ORDER BY playerID) AS players
  FROM teams_w_nulls
  GROUP BY teams
  -- exclude players who are only share a team list with themselves.
  HAVING array_length(array_agg(DISTINCT playerID ORDER BY playerID),1)>1)
SELECT a.teams, b.playerID, b.playerName, c.playerID, c.playerName
FROM player_sets AS a
INNER JOIN player AS b
ON a.players[1]=b.playerID
INNER JOIN player AS c
ON a.players[2]=c.playerID;

The query above gives the following output:

 teams | playerid | playername | playerid | playername 
-------+----------+------------+----------+------------
 {1,3} |        2 | Allen      |        3 | Pierce
       |        4 | Garnett    |        5 | Perkins
(2 rows)

Example PL/Python functions:

CREATE OR REPLACE FUNCTION set(the_list integer[])
  RETURNS integer[] AS
$BODY$
    return list(set(the_list))
$BODY$
  LANGUAGE plpython2u;

CREATE OR REPLACE FUNCTION pairs(a_set integer[])
  RETURNS SETOF integer[] AS
$BODY$
    def pairs(x):
        for i in range(len(x)):
            for j in x[i+1:]:
                yield [x[i], j]
    return list(pairs(a_set))
$BODY$
  LANGUAGE plpython2u;

SELECT set(ARRAY[1, 1, 2, 3, 4, 5, 6, 6]);

Version of code above using these functions (output is similar, but this approach selects all pairs when there is more than one for a given set of teams):

WITH teams AS (
  SELECT playerID, set(array_agg(teamID)) AS teams
  FROM plays
  GROUP BY playerID),
teams_w_nulls AS (
  SELECT a.playerID, b.teams
  FROM player AS a
  LEFT JOIN teams AS b
  ON a.playerID=b.playerID),
player_pairs AS (
  SELECT teams, pairs(set(array_agg(playerID))) AS pairs
  FROM teams_w_nulls
  GROUP BY teams)
  -- no need to exclude players who are only share a team 
  -- list with themselves.
SELECT teams, pairs[1] AS player_1, pairs[2] AS player_2
FROM player_pairs;

Comments

0

We make a query with the count of the teams per player and sum of ascii(team_name)+team_id call it team_value. We do a self join, of the same query with itself where counts and team_values match but id not equal to id, that gives us the ID's we want to fetch

select * from player where player_id in 
(
 select set2.player_id orig
 from
 (select count(*) count,b.player_id , nvl(sum(a.team_id+ascii(team_name)),0) team_value
   from plays a, player b , team c
   where a.player_id(+)=b.player_id
    and a.team_id = c.team_id(+)
   group by b.player_id) set1,
(select count(*) count,b.player_id , nvl(sum(a.team_id+ascii(team_name)),0) team_value
   from plays a, player b , team c
   where a.player_id(+)=b.player_id
    and a.team_id = c.team_id(+)
   group by b.player_id) set2
where set1.count=set2.count and set1.team_value=set2.team_value
  and set1.player_id<>set2.player_id
)

Comments

0

Here is the simple query with UNION and 2-3 simple joins. 1st query before UNION contains player name and playerid who has played for same number of teams for equal number of times. 2nd query after UNION contains player name and playerid who has not played for any team at all.

Simply copy paste this query and try to execute it, you will see the expected results.

    select playername,c.playerid from 
    (select a.cnt, a.playerid from 
    (select count(1) cnt , PLAYERID from plays group by  PLAYERID) a ,
    (select count(1) cnt , PLAYERID from plays group by  PLAYERID) b 
    where a.cnt=b.cnt
    and  a.playerid<> b.playerid ) c ,PLAYER  d
    where c.playerid=d.playerid
    UNION
    select e.playername,e.playerid 
    from player e 
    left outer join plays f on 
    e.playerid=f.playerid where nvl(teamid,0 )=0

Comments

0

Try this one : Here test is PLAYS table in your question.

select group_concat(b.name),a.teams from
(SELECT playerid, group_concat(distinct teamid ORDER BY teamid) AS teams
  FROM test
  GROUP BY playerid) a, player b
where a.playerid=b.playerid
group by a.teams
union
select group_concat(c.name order by c.playerid),null from player c where c.playerid not in (select        playerid from test);

Comments

0

For anyone interested, this simple query works for me

SELECT UNIQUE PLR1.PID,PLR1.PNAME, PLR2.PID, PLR2.PNAME
FROM PLAYS PLY1,PLAYS PLY2, PLAYER PLR1, PLAYER PLR2
WHERE PLR1.PID < PLR2.PID AND PLR1.PID = PLY1.PID(+) AND PLR2.PID = PLY2.PID(+)
AND NOT EXISTS(( SELECT PLY3.TEAMID FROM PLAYS PLY3 WHERE PLY3.PID = PLR1.PID) 
MINUS
( SELECT PLY4.TEAMID FROM PLAYS PLY4 WHERE PLY4.PID = PLR2.PID));

Comments

0

select p1.playerId, p2.playerId, count(p1.playerId)
from plays p1, plays p2
WHERE p1.playerId<p2.playerId
and p1.teamId = p2.teamId
GROUP BY p1.playerId, p2.playerId
having count(*) = (select count(*) from plays where playerid = p1.playerid)

Comments

0
WITH temp AS (
  SELECT p.playerid, p.playername, listagg(t.teamname,',') WITHIN GROUP (ORDER BY t.teamname) AS teams
  FROM player p full OUTER JOIN plays p1 ON p.playerid = p1.playerid
    LEFT JOIN team t ON p1.teamid = t.teamid GROUP BY (p.playerid , p.playername))
SELECT concat(concat(t1.playerid,','), t1.playername), t1.teams 
FROM temp t1 WHERE nvl(t1.teams,' ') IN (
  SELECT nvl(t2.teams,' ') FROM temp t2 
  WHERE t1.playerid <> t2.playerid) 
ORDER BY t1.playerid

Comments

0

This is ANSI SQL , without using any special functions.

SELECT   TAB1.T1_playerID AS playerID1 , TAB1.playerName1  ,   
  TAB1.T2_playerID AS playerID2, TAB1. playerName2
 FROM
(select   T1.playerID AS T1_playerID ,  T3. playerName  AS  playerName1 ,

T2.playerID  AS T2_playerID ,  T4. playerName AS playerName2  ,COUNT (T1.TeamID) AS MATCHING_TEAM_ID_CNT
FROM PLAYS T1
INNER JOIN PLAYS T2  ON(  T1.TeamID = T2.TeamID AND T1.playerID <> T2.playerID )
INNER JOIN player T3 ON (  T1.playerID=T3.playerID)
INNER JOIN player T4 ON (  T2.playerID=T4.playerID)
 GROUP BY 1,2,3,4
) TAB1

INNER JOIN 
( SELECT  T1.playerID AS playerID, COUNT(T1.TeamID) AS TOTAL_TEAM_CNT
 FROM PLAYS  T1
GROUP BY T1.playerID) TAB2
ON(TAB1.T2_playerID=TAB2.playerID AND    
  TAB1.MATCHING_TEAM_ID_CNT =TAB2.TOTAL_TEAM_CNT)

INNER JOIN 
( SELECT  T1.playerID AS playerID, COUNT(T1.TeamID) AS TOTAL_TEAM_CNT
FROM PLAYS  T1
GROUP BY T1.playerID 
) TAB3
ON( TAB1. T1_playerID = TAB3.playerID  AND 
 TAB1.MATCHING_TEAM_ID_CNT=TAB3.TOTAL_TEAM_CNT)
WHERE playerID1  < playerID2

    UNION ALL (
    SELECT   T1.playerID, T1.playerName ,T2.playerID,T2.playerName
    FROM
    PLAYER T1 INNER JOIN PLAYER T2
    ON (T1.playerID<T2.playerID) 
    WHERE T1.playerID NOT IN ( SELECT playerID FROM PLAYS))

Comments

0

Assuming your teamId is unique this query will work. It simply identifies all players that have the exact same teams by summing the teamid or if the player has no ids it will be null. Then counts the number of matches over team matches. I tested using SQL fiddle in postgre 9.3.

SELECT 
     b.playerID
    ,b.playerName
FROM (
--Join the totals of teams to your player information and then count over the team matches.
        SELECT 
                p.playerID
                ,p.playerName
                ,m.TeamMatches
                ,COUNT(*) OVER(PARTITION BY TeamMatches) as Matches
        FROM player p
                LEFT JOIN (
                --Assuming your teamID is unique as it should be. If it is then a sum of the team ids for a player will give you each team they play for. 
                --If for some reason your team id is not unique then rank the table and join same as below. 
                    SELECT 
                         ps.playerName
                        ,ps.playerID
                        ,SUM(t.teamID) as TeamMatches
                    FROM plays p
                            LEFT JOIN team t ON p.teamID = p.teamID
                            LEFT JOIN player ps ON p.playerID = ps.playerID
                    GROUP BY 
                            ps.playerName
                        ,ps.playerID
                ) m ON p.playerID = m.playerID
) b
WHERE
b.Matches <> 1

Comments

-1

This Query should solve it. By doing a self join on PLAYS. - Compare on the player Id - Compare the matching row count with the total count for each player.

select p1.playerId, p2.playerId, count(p1.playerId)
from plays p1, plays p2
WHERE p1.playerId<p2.playerId
and p1.teamId = p2.teamId
GROUP BY p1.playerId, p2.playerId
having count(*) = (select count(*) from plays where playerid = p1.playerid)

1 Comment

please use 'join' and 'on' instead of the old one.
-2

Create function in SQl 2008

ALTER FUNCTION [dbo].[fngetTeamIDs] ( @PayerID int ) RETURNS varchar(101) AS Begin

declare @str varchar(1000)

SELECT @str= coalesce(@str + ', ', '') + CAST(a.TeamID AS varchar(100)) FROM (SELECT DISTINCT TeamID from Plays where PayerId=@PayerID) a

return @str

END

--select dbo.fngetTeamIDs(2)

Query start here

drop table #temp,#A,#B,#C,#D

(select PayerID,count(*) count 
into #temp
from Plays 
group by PayerID)


select *
into #A
from #temp as T

where T.count in (
        select T1.count from #temp as T1
        group by T1.count having count(T1.count)>1 
)

select A.*,P.TeamID
into #B
from #A A inner join Plays P
on A.PayerID=P.PayerID
order by A.count


select B.PayerId,B.count, 
(
select dbo.fngetTeamIDs(B.PayerId)
) as TeamIDs
into #C
from #B B
group by B.PayerId,B.count


select TeamIDs 
into #D
from #c as C
group by C.TeamIDs
having count(C.TeamIDs)>1

select C.PayerId,P.PlayerName,D.TeamIDs
from #D D inner join #C C
on D.TeamIDs=C.TeamIDs
inner join Player P
on C.PayerID=P.PlayerID

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.