0

I have a few tables that make up a media catalog of live/studio music, where each media item has zero-many show dates, CDs and Vinyl associated to it. The query I have at the moment pulls out statistics that results in a tabular set of data for the all the media items available. I'm having trouble now extending the query to include finer grained statistics on each associated table.

Schema:

media(id , title)

cd(media_fk, type)

vinyl(media_fk)

gig(id, date)

media_gigs(gig_fk, media_fk)

Query I have thus far:

SELECT m.id, m.title, COUNT(DISTINCT c.id) as cds, COUNT(DISTINCT v.id) as vinyl, gig.id as gid, gig.date as gdate FROM media m LEFT JOIN cd c on m.id = c.media LEFT JOIN vinyl v on m.id = v.media LEFT JOIN media_gigs g on m.id = g.media LEFT JOIN gig gig on g.gig = gig.id GROUP BY m.id, gig.id;

Which produces:

id |  title  | cds | vinyl |           gid            |   gdate    
---+---------+-----+-------+--------------------------+------------
 1 | title 1 |   5 |     1 | may-11-1989-kawasaki     | 1989-05-11
 1 | title 1 |   5 |     1 | may-13-1989-tokyo        | 1989-05-13
 2 | title 2 |   6 |     0 | apr-29-1998-nagoya       | 1998-04-29
 2 | title 2 |   6 |     0 | may-6-1998-tokyo         | 1998-05-06
 2 | title 2 |   6 |     0 | may-7-1998-tokyo         | 1998-05-07
 3 | title 3 |   6 |     2 | dec-1-1986-new-york-city | 1986-12-01
 3 | title 3 |   6 |     2 | dec-5-1986-quebec-city   | 1986-12-05
 3 | title 3 |   6 |     2 | nov-19-1986-tokyo        | 1986-11-19
 3 | title 3 |   6 |     2 | nov-20-1986-tokyo        | 1986-11-20

cd.type is an enum type of [silver,cdr,pro-cdr] that I'm wanting to add to the results. So, the the end goal is to have 3 additional columns that are a count of the type of cd associated to each media item. I've not found the correct syntax using COUNT or otherwise to aggregate the cd based on its type, so looking for a push in the right direction. I'm fairly new to SQL so what I have so far may be a bit naive.

Using PG 9.3.

2 Answers 2

2

You can use the CASE function to determine the cd type and do a SUM based on the result, as below:

SELECT 
m.id, 
m.title, 
COUNT(DISTINCT c.id) as cds, 
COUNT(DISTINCT v.id) as vinyl, 
gig.id as gid, gig.date as gdate,
SUM(case cd.type
           when 'silver' then 1
           else 0
           end) silver,
SUM(case cd.type
           when 'cdr' then 1
           else 0
           end) cdr,
SUM(case cd.type
           when 'pro-cdr' then 1
           else 0
           end) pro_cdr
FROM media m
LEFT JOIN cd c on m.id = c.media
LEFT JOIN vinyl v on m.id = v.media 
LEFT JOIN media_gigs g on m.id = g.media 
LEFT JOIN gig gig on g.gig = gig.id
GROUP BY m.id, gig.id;

References:

  1. Conditional Expressions on PostgreSQL 9.3 Manual
  2. Enumerated Types on PostgreSQL 9.3 Manual
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the example and documentation, exactly what was needed.
0

As other poster has mentioned, you can do this with a SUM(CASE WHEN <cond1> THEN 1 ELSE 0) construction on the c.type column.

There are some other problems with your SQL I would like to mention:

Incorrect use of LEFT JOIN

You group on a value that might be NULL: gig.id. This is probably because of incorrect use of the LEFT JOIN. Only use left join if you want to keep rows in the result set that have no match in the joining table.

So on the CD table a left join is correct, because you also want to be able to show that there are 0 cd's. On the media_gigs and the gigs table you probably want an INNER JOIN, because there always has to be a match.

Edit: It's possible that I mistakenly thought this was incorrect. I assumed from the sample data that you don't want to display media for which there is no gig.

Non-grouping, non-aggregate columns

In your query you select columns that you don't group on, which are not aggregate functions (like SUM, COUNT). While some Db dialects may accept this, it is bad practice. For instance, take the following query:

SELECT x, y, SUM(z) FROM t
GROUP BY x;

If y is not functionally dependant on x, that is, if there can be different values of y for one value of x, it is not clear which of these values should be displayed. Therefore your should always write it like this:

SELECT x, y, SUM(z) FROM t
GROUP BY x, y;

2 Comments

Sorry, just rereading your observations - deleted the old comment where I had misunderstood you. Still not sure I have grokked this though. It is currently producing the correct results for media with and without gigs and adding an INNER JOIN will eliminate non-gig media. Have I missed something?
The only difference between INNER JOIN and LEFT OUTER JOIN is that the latter will retain results in the left table that don't match the join clause in the right table. I probably incorrectly assumed from the sample data that only media for which there are gigs should be displayed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.