PostgreSQL Using COUNT to form statistical results

Question

I have a few tables that make up a media catalog of live/studio music, where each media item has zero-many show dates, CDs and Vinyl associated to it. The query I have at the moment pulls out statistics that results in a tabular set of data for the all the media items available. I'm having trouble now extending the query to include finer grained statistics on each associated table.

Schema:

media(id , title)

cd(media_fk, type)

vinyl(media_fk)

gig(id, date)

media_gigs(gig_fk, media_fk)

Query I have thus far:

SELECT m.id, m.title, COUNT(DISTINCT c.id) as cds, COUNT(DISTINCT v.id) as vinyl, gig.id as gid, gig.date as gdate FROM media m LEFT JOIN cd c on m.id = c.media LEFT JOIN vinyl v on m.id = v.media LEFT JOIN media_gigs g on m.id = g.media LEFT JOIN gig gig on g.gig = gig.id GROUP BY m.id, gig.id;

Which produces:

id |  title  | cds | vinyl |           gid            |   gdate    
---+---------+-----+-------+--------------------------+------------
 1 | title 1 |   5 |     1 | may-11-1989-kawasaki     | 1989-05-11
 1 | title 1 |   5 |     1 | may-13-1989-tokyo        | 1989-05-13
 2 | title 2 |   6 |     0 | apr-29-1998-nagoya       | 1998-04-29
 2 | title 2 |   6 |     0 | may-6-1998-tokyo         | 1998-05-06
 2 | title 2 |   6 |     0 | may-7-1998-tokyo         | 1998-05-07
 3 | title 3 |   6 |     2 | dec-1-1986-new-york-city | 1986-12-01
 3 | title 3 |   6 |     2 | dec-5-1986-quebec-city   | 1986-12-05
 3 | title 3 |   6 |     2 | nov-19-1986-tokyo        | 1986-11-19
 3 | title 3 |   6 |     2 | nov-20-1986-tokyo        | 1986-11-20

cd.type is an enum type of [silver,cdr,pro-cdr] that I'm wanting to add to the results. So, the the end goal is to have 3 additional columns that are a count of the type of cd associated to each media item. I've not found the correct syntax using COUNT or otherwise to aggregate the cd based on its type, so looking for a push in the right direction. I'm fairly new to SQL so what I have so far may be a bit naive.

Using PG 9.3.

Joseph B · Accepted Answer · 2014-04-21 21:23:10Z

2

You can use the CASE function to determine the cd type and do a SUM based on the result, as below:

SELECT 
m.id, 
m.title, 
COUNT(DISTINCT c.id) as cds, 
COUNT(DISTINCT v.id) as vinyl, 
gig.id as gid, gig.date as gdate,
SUM(case cd.type
           when 'silver' then 1
           else 0
           end) silver,
SUM(case cd.type
           when 'cdr' then 1
           else 0
           end) cdr,
SUM(case cd.type
           when 'pro-cdr' then 1
           else 0
           end) pro_cdr
FROM media m
LEFT JOIN cd c on m.id = c.media
LEFT JOIN vinyl v on m.id = v.media 
LEFT JOIN media_gigs g on m.id = g.media 
LEFT JOIN gig gig on g.gig = gig.id
GROUP BY m.id, gig.id;

References:

answered Apr 21, 2014 at 21:23

Joseph B

5,7291 gold badge17 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

markdsievers Over a year ago

Thanks for the example and documentation, exactly what was needed.

wvdz · Accepted Answer · 2014-04-22 00:08:57Z

0

As other poster has mentioned, you can do this with a SUM(CASE WHEN <cond1> THEN 1 ELSE 0) construction on the c.type column.

There are some other problems with your SQL I would like to mention:

Incorrect use of LEFT JOIN

You group on a value that might be NULL: gig.id. This is probably because of incorrect use of the LEFT JOIN. Only use left join if you want to keep rows in the result set that have no match in the joining table.

So on the CD table a left join is correct, because you also want to be able to show that there are 0 cd's. On the media_gigs and the gigs table you probably want an INNER JOIN, because there always has to be a match.

Edit: It's possible that I mistakenly thought this was incorrect. I assumed from the sample data that you don't want to display media for which there is no gig.

Non-grouping, non-aggregate columns

In your query you select columns that you don't group on, which are not aggregate functions (like SUM, COUNT). While some Db dialects may accept this, it is bad practice. For instance, take the following query:

SELECT x, y, SUM(z) FROM t
GROUP BY x;

If y is not functionally dependant on x, that is, if there can be different values of y for one value of x, it is not clear which of these values should be displayed. Therefore your should always write it like this:

SELECT x, y, SUM(z) FROM t
GROUP BY x, y;

edited Apr 22, 2014 at 0:08

answered Apr 21, 2014 at 21:36

wvdz

16.2k4 gold badges60 silver badges97 bronze badges

2 Comments

markdsievers Over a year ago

Sorry, just rereading your observations - deleted the old comment where I had misunderstood you. Still not sure I have grokked this though. It is currently producing the correct results for media with and without gigs and adding an INNER JOIN will eliminate non-gig media. Have I missed something?

wvdz Over a year ago

The only difference between INNER JOIN and LEFT OUTER JOIN is that the latter will retain results in the left table that don't match the join clause in the right table. I probably incorrectly assumed from the sample data that only media for which there are gigs should be displayed.

Collectives™ on Stack Overflow

PostgreSQL Using COUNT to form statistical results

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related