1

I have a price table that has gaps in two main variables, dates (sales_date) and the sales channel (channel). I need to fill those gaps for all the combinations of posible skus (ean) and clients (id_client).

At this moment I have been able to fill the dates and channels, but there are some cases where multiple channels are shared across the same date, in those "bizarre" cases my approach is duplicating everything.

Tables

create table prices_master (
   id_price serial primary key,
   sales_date date,
   ean varchar(15),
   id_client int,
   channel varchar(15),
  price float
);

create table channels_master (
   id_channel serial primary key, 
   channel varchar(15)
);

insert into prices_master (sales_date, ean, id_client, channel, price) 
values
('2015-07-01', '7506205801143', 7, 'COMERCIAL',47655),  
('2015-08-01', '7506205801143', 7, 'COMERCIAL',51655),
('2015-12-01', '7506205801143', 7, 'COMERCIAL', 55667),
('2015-12-01', '7506205801143', 7, 'DISTRIBUIDOR', 35667),
('2015-07-01', '5052197008555', 7, 'DISTRIBUIDOR', 7224),
('2015-10-01', '5052197008555', 7, 'DISTRIBUIDOR', 8224);

insert into channels_master (channel) values 
('DISTRIBUIDOR'), ('INSTITUCIONAL'), ('NON_TRADE'), ('COMERCIAL');

My approach

WITH full_dates AS (
    WITH min_max AS (
      SELECT min(prm.sales_date) AS min_date, ((max(prm.sales_date))) :: date AS max_date
      FROM prices_master prm
)
  SELECT generate_series((min_max.min_date) :: timestamp with time zone,
                       (min_max.max_date) :: timestamp with time zone, '1 mon' :: interval) AS sales_date
  FROM min_max), 
completechannels AS (
  SELECT DISTINCT channel
  FROM channels_master
 ), 
temp AS (
  SELECT prices_master.sales_date,
         prices_master.id_client,
         prices_master.ean,
         prices_master.channel,
         prices_master.price,
         lead(
           prices_master.sales_date) OVER (PARTITION BY prices_master.id_client, prices_master.ean, prices_master.channel ORDER BY prices_master.sales_date) AS next_sales_date
  FROM prices_master
  ORDER BY prices_master.id_client, prices_master.ean, prices_master.channel, prices_master.sales_date
 )
SELECT (full_dates.sales_date) :: date AS sales_date,
     temp.id_client,
     temp.ean,
     completechannels.channel,
     price
FROM full_dates
     JOIN temp ON full_dates.sales_date >= temp.sales_date AND 
     (full_dates.sales_date < temp.next_sales_date OR temp.next_sales_date IS NULL)
     JOIN completechannels ON 1=1
     ORDER BY temp.id_client, temp.ean, completechannels.channel, 
     full_dates.sales_date;

My problem comes on code 7506205801143 on sales_date 2015-12-01, since this code has prices for both channels DISTRIBUIDOR and COMERCIAL my approach is duplicating the rows:

My approach result (bad)

+------------+-----------+---------------+---------------+-------+
| sales_date | id_client |      ean      |    channel    | price |
+------------+-----------+---------------+---------------+-------+
| 2015-12-01 |         7 | 7506205801143 | COMERCIAL     | 55667 |
| 2015-12-01 |         7 | 7506205801143 | COMERCIAL     | 35667 |
| 2015-12-01 |         7 | 7506205801143 | DISTRIBUIDOR  | 55667 |
| 2015-12-01 |         7 | 7506205801143 | DISTRIBUIDOR  | 35667 |
| 2015-12-01 |         7 | 7506205801143 | INSTITUCIONAL | 35667 |
| 2015-12-01 |         7 | 7506205801143 | INSTITUCIONAL | 55667 |
| 2015-12-01 |         7 | 7506205801143 | NON_TRADE     | 55667 |
| 2015-12-01 |         7 | 7506205801143 | NON_TRADE     | 35667 |
+------------+-----------+---------------+---------------+-------+

Expected Result (good)

+------------+-----------+---------------+---------------+-------+
| sales_date | id_client |      ean      |    channel    | price |
+------------+-----------+---------------+---------------+-------+
| 2015-12-01 |         7 | 7506205801143 | COMERCIAL     | 55667 |
| 2015-12-01 |         7 | 7506205801143 | DISTRIBUIDOR  | 35667 |
| 2015-12-01 |         7 | 7506205801143 | INSTITUCIONAL | 55667 |
| 2015-12-01 |         7 | 7506205801143 | NON_TRADE     | 55667 |
+------------+-----------+---------------+---------------+-------+

For INSTITUTIONAL and NON_TRADE the max price can be used to fill the gaps.

2 Answers 2

1

You're going to find this much easier to do by flipping some of this on its head, and considering the main prices table to be overrides. That is, you want to construct a "base" table of just the (maximum) price for the date/client/ean tuple, and ignoring the channel until later.

First, you need to add the following CTE to the ones you already have (formatting/naming updated to my usual style):

Maximum_Price_Per_Date AS (
    SELECT Date_Range.sales_date, Price_Date_Range.id_client, Price_Date_Range.ean, 
           MAX(Price_Date_Range.price) AS price
    FROM Date_Range
    JOIN Price_Date_Range -- aka TEMP in your original query
      ON Price_Date_Range.sales_date <= Date_Range.sales_date
          AND (Price_Date_Range.next_sales_date > Date_Range.sales_date OR Price_Date_Range.next_sales_date IS NULL)
    GROUP BY Date_Range.sales_date, Price_Date_Range.id_client, Price_Date_Range.ean
)

This makes the set multiplication from the cartesian product (JOIN completechannels ON 1=1 - although normally done via CROSS JOIN) work with you: now there won't be extra rows:

SELECT Maximum_Price_Per_Date.sales_date, Maximum_Price_Per_Date.id_client, Maximum_Price_Per_Date.ean,
       Channel.channel, 
       Maximum_Price_Per_Date.price
FROM Maximum_Price_Per_Date
CROSS JOIN (SELECT DISTINCT channel
            FROM Channels_Master) Channel

generates (uninteresting rows omitted):

| sales_date | channel | id_client     | ean           | price |
|------------|---------|---------------|---------------|-------|
| 2015-12-01 | 7       | 7506205801143 | DISTRIBUIDOR  | 55667 |
| 2015-12-01 | 7       | 7506205801143 | COMERCIAL     | 55667 |
| 2015-12-01 | 7       | 7506205801143 | NON_TRADE     | 55667 |
| 2015-12-01 | 7       | 7506205801143 | INSTITUCIONAL | 55667 |

Now we just have to LEFT JOIN back (again) to the Price_Date_Range CTE again, using the price there, if present:

-- Note that you should have a Calendar table, which would remove this.
WITH Date_Range AS (
    -- You probably should be using an explicit range here, to account for future dates.
    WITH Min_Max AS (
        SELECT MIN(sales_date) AS min_date, MAX(sales_date) AS max_date
        FROM Prices_Master
    ),
    Timezone_Range AS (
        SELECT GENERATE_SERIES(min_date, max_date, CAST('1 mon' AS INTERVAL)) AS sales_date
        FROM Min_Max
    )
    SELECT CAST(sales_date AS DATE) AS sales_date
    FROM Timezone_Range
),
-- This would really benefit by being a MQT - materialized query table
Price_Date_Range AS (
    SELECT sales_date, lead(sales_date) OVER (PARTITION BY id_client, ean, channel ORDER BY sales_date) AS next_sales_date,
           id_client, ean, channel, price
    FROM Prices_Master
), 
Maximum_Price_Per_Date AS (
    SELECT Date_Range.sales_date, Price_Date_Range.id_client, Price_Date_Range.ean, 
           MAX(Price_Date_Range.price) AS price
    FROM Date_Range
    JOIN Price_Date_Range
      ON Price_Date_Range.sales_date <= Date_Range.sales_date
          AND (Price_Date_Range.next_sales_date > Date_Range.sales_date OR Price_Date_Range.next_sales_date IS NULL)
    GROUP BY Date_Range.sales_date, Price_Date_Range.id_client, Price_Date_Range.ean
)
SELECT Maximum_Price_Per_Date.sales_date, Maximum_Price_Per_Date.id_client, Maximum_Price_Per_Date.ean,
       Channel.channel, 
       COALESCE(Price_Date_Range.price, Maximum_Price_Per_Date.price) AS price
FROM Maximum_Price_Per_Date
CROSS JOIN (SELECT DISTINCT channel
            FROM Channels_Master) Channel
LEFT JOIN Price_Date_Range
       ON Price_Date_Range.channel = Channel.channel
          AND Price_Date_Range.id_client = Maximum_Price_Per_Date.id_client
          AND Price_Date_Range.ean = Maximum_Price_Per_Date.ean
          AND Price_Date_Range.sales_date <= Maximum_Price_Per_Date.sales_date
          AND (Price_Date_Range.next_sales_date > Maximum_Price_Per_Date.sales_date OR Price_Date_Range.next_sales_date IS NULL)
ORDER BY Maximum_Price_Per_Date.sales_date, Maximum_Price_Per_Date.id_client, Maximum_Price_Per_Date.ean, Channel.channel

Fiddle example
(thanks to @D-Shih for the setup)
generates (uninteresting rows omitted):

| sales_date | channel | id_client     | ean           | price |
|------------|---------|---------------|---------------|-------|
| 2015-12-01 | 7       | 7506205801143 | COMERCIAL     | 55667 |
| 2015-12-01 | 7       | 7506205801143 | DISTRIBUIDOR  | 35667 |
| 2015-12-01 | 7       | 7506205801143 | INSTITUCIONAL | 55667 |
| 2015-12-01 | 7       | 7506205801143 | NON_TRADE     | 55667 |
Sign up to request clarification or add additional context in comments.

1 Comment

Excellent!. Thanks for the naming convention, actually made things a lot clearer. It is an MQT, just wanted to keep ti simple.
1

You can try to use ROW_NUMBER window function by sales_date DESC in a subquery to get max row data for each channel

then use coalesce and MAX window function to make it.

Query 1:

WITH pricesCTE as (
   SELECT price,sales_date,id_client,ean,cm.channel,ROW_NUMBER() OVER(PARTITION BY cm.channel ORDER BY sales_date DESC) rn
   FROM (SELECT DISTINCT channel FROM channels_master) cm 
   LEFT JOIN prices_master pm on pm.channel = cm.channel
)
SELECT 
      coalesce(sales_date,MAX(sales_date) OVER(ORDER BY coalesce(price,0) DESC)) sales_date,
      coalesce(id_client,MAX(id_client) OVER(ORDER BY coalesce(price,0) DESC)) id_client,
      coalesce(ean,MAX(ean) OVER(ORDER BY coalesce(price,0) DESC)) ean,
      channel,
      coalesce(price,MAX(price) OVER(ORDER BY coalesce(price,0) DESC)) price
FROM 
(
  select *
  from pricesCTE 
  where rn = 1
) t1

Results:

| sales_date | id_client |           ean |       channel | price |
|------------|-----------|---------------|---------------|-------|
| 2015-12-01 |         7 | 7506205801143 |     COMERCIAL | 55667 |
| 2015-12-01 |         7 | 7506205801143 |  DISTRIBUIDOR | 35667 |
| 2015-12-01 |         7 | 7506205801143 | INSTITUCIONAL | 55667 |
| 2015-12-01 |         7 | 7506205801143 |     NON_TRADE | 55667 |

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.