2

I have 3 tables that I need for reporting:

    *dates*         
date_sk | full_date         
1       | 2013-01-01            
2       | 2013-02-01            
3       | 2013-03-01            

    *person*            
person_sk   | person_id  | person_name      
1           |   10       |   John       
2           |   11       |   Bob        
3           |   12       |   Jill       



    *person_portfolio*          
person_portfolio_sk | date_sk | person_sk | res_value | report_month
1                   |   1     |     1     |     15    |  2013-01-01
2                   |   1     |     2     |     10    |  2013-01-01
3                   |   1     |     3     |      1    |  2013-01-01
4                   |   2     |     1     |     30    |  2013-02-01

(imagine the 'date' table filled with every date for the past 10 and next 10 years)

I have been struggling to find out, for comparison reporting purposes using a date range, how to replace no entries during that timeframe with 0 values for the person. Here is the query I have tried:

SELECT
 p.person_id,
 COALESCE(pp.res_value,0)::NUMERIC(16,2) AS res_value,
 pp.report_month
FROM person p
LEFT JOIN person_portfolio pp
ON p.person_sk = pp.person_sk
LEFT JOIN date d
ON d.date_sk = pp.date_sk
WHERE person_id IN ('10','11','12')
AND pp.report_month >= '2013-01-01' --From Date
AND pp.report_month <= '2013-05-01' -- To Date
AND d.day_number_of_month = 1
ORDER BY p.person_id DESC;

The output I want to return would end up being 15 rows total. 3 people x 5 months of data = 15 total rows. I left out the day_number_of_month column in the date table but it holds the number 1 for the first of each month, 2 for the second, etc (every day of every month is in this table). It should look like this:

person_id   | res_value | report_month
10          |   15      |   2013-01-01
10          |   30      |   2013-02-01
10          |   0       |   2013-03-01
10          |   0       |   2013-04-01
10          |   0       |   2013-05-01
11          |   10      |   2013-01-01
11          |   0       |   2013-02-01
11          |   0       |   2013-03-01
11          |   0       |   2013-04-01
11          |   0       |   2013-05-01
12          |   1       |   2013-01-01
12          |   0       |   2013-02-01
12          |   0       |   2013-03-01
12          |   0       |   2013-04-01
12          |   0       |   2013-05-01

but I am only getting these results:

person_id   | res_value | report_month
10          |   15      |  2013-01-01
10          |   30      |  2013-02-01
11          |   10      |  2013-01-01
12          |    1      |  2013-01-01

So basically... is there currently a feasible way that I could inject the 0 value rows into the results when there is no entry for the 'report_month' for a specific person(s)? I would appreciate any kind of help as I have been working on this for 2 weeks now trying to complete this report. Thanks!

2 Answers 2

1

Your description of the output provides guidance on how to solve the problem. First generate the rows, using a cross join. Then bring in the rest of the data.

Given the structure of your query, I don't see the purpose of the date table. If I assume that there is at least one report for each reporting period, I can do:

SELECT p.person_id,
       COALESCE(pp.res_value,0)::NUMERIC(16,2) AS res_value,
       d.report_month
FROM (SELECT DISTINCT person_id FROM person p WHERE person_id IN ('10', '11', '12')
     ) p CROSS JOIN
     (SELECT DISTINCT pp.report_month
      FROM person_portfolio pp
      WHERE pp.report_month >= '2013-01-01' AND
            pp.report_month <= '2013-05-01' 
     ) d LEFT JOIN
     person_portfolio pp
     ON p.person_sk = pp.person_sk and
        d.report_month = pp.report_month
ORDER BY p.person_id DESC, d.report_month asc;

However, this is not true in your data. You can generate the dates. In your environment, I don't know if it is better to use generate_series() or the date table. In any case, this would be replacing the d subquery above with one that has all the dates of interest.

Sign up to request clarification or add additional context in comments.

4 Comments

The purpose of the date table in the query is because there are ALL dates (including days) in the date table. I left that out to keep it a bit easier to understand but I would eventually use an AND clause to do 'AND d.day_number_of_month = 1' for this monthly reporting. Thanks a ton for the help. I will check it out!
Gordon: I just applied your solution and it does give me all person_id with 0.00 but only for one month of the result set and not the entire range.
Replace the d subquery with something like (SELECT d.date FROM date where d.day_number_of_month = 1).
Amazing. Thanks a ton. This is now working and returning exactly what I have been trying to return for the last couple weeks. I thought I was going to have to solve this at a PHP level but I am glad I didn't. Thanks Gordon!
0

look up "OUTER JOIN" ..

Untested, but you could try something like this? (start with your date table, restrict the date range by the range you want, then start joining them to your other tables ... OUTER JOIN says "Even if you can't find a person with data on this date, keep the date .. I want to see it)

SELECT
 p.person_id,
 COALESCE(pp.res_value,0)::NUMERIC(16,2) AS res_value,
 pp.report_month
FROM date d
   LEFT OUTER JOIN person p
   ON d.date_sk = p.date_sk
   LEFT OUTER JOIN person_portfolio pp
   ON p.person_sk = pp.person_sk
WHERE person_id IN ('10','11','12')
AND d.date_sk >= '2013-01-01' --From Date
AND d.date_sk <= '2013-05-01' -- To Date
ORDER BY p.person_id DESC;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.