I have a situation in SQL (PostgreSQL specifically) that I'm struggling with. The schema/model that I'm working with is not under my control and not something I'm able to alter, so I am trying to figure out the best way to deal with the cards I've been dealt.
First, the schema, simplified for this question, but essentially it's invoice (Type = T) and transaction (Type <> T) lines combined into the same table. There can and will be n-number of tranaction lines per invoice and n-number of invoices per client.
| Id | Type | InvoiceNo | ClientId |
|---|---|---|---|
| 100 | I | 100 | 1 |
| 99 | X | 0 | 1 |
| 98 | S | 0 | 1 |
| 97 | T | 0 | 1 |
| 96 | I | 99 | 1 |
| 95 | X | 0 | 1 |
| 94 | S | 0 | 1 |
What I ultimately would like to end up with is something like the below, with the Invoice (Type = I) records removed and the Transaction (Type <> T) records that fall after each Invoice record populated with it's corresponding InvoiceId value.
| Id | Type | InvoiceNo | ClientId |
|---|---|---|---|
| 99 | X | 100 | 1 |
| 98 | S | 100 | 1 |
| 97 | T | 100 | 1 |
| 95 | X | 99 | 1 |
| 94 | S | 99 | 1 |
So far, the closest I've been able to get, which isn't very close, is using the below SQL:
select
t1.Id,
t1.Type,
t2.InvoiceNo,
t1.ClientId
from table AS t1
join (select
Id,
InvoiceNo,
ClientId
from table
where type = 'I') as t2
on t1.ClientId = t2.ClientId
where t1.ClientId = t2.ClientId and t1.Id <= t2.Id and t1.Type <> 'I'
The result of that looks something like the below, which works fine for the first invoice per client and then creates extra transaction records for each invoice
| Id | Type | InvoiceNo | ClientId |
|---|---|---|---|
| 99 | X | 100 | 1 |
| 98 | S | 100 | 1 |
| 97 | T | 100 | 1 |
| 95 | X | 100 | 1 |
| 95 | X | 99 | 1 |
| 94 | S | 100 | 1 |
| 94 | S | 99 | 1 |
Any help or guidance is much appreciated!
** Updated with more complex example **
Source:
| Id | Type | InvoiceNo | ClientId |
|---|---|---|---|
| 1 | X | 0 | 1 |
| 2 | I | 97 | 1 |
| 3 | S | 0 | 2 |
| 4 | X | 0 | 2 |
| 5 | S | 0 | 1 |
| 6 | I | 98 | 2 |
| 7 | S | 0 | 1 |
| 8 | X | 0 | 1 |
| 9 | I | 99 | 1 |
| 10 | T | 0 | 1 |
| 11 | S | 0 | 1 |
| 12 | X | 0 | 1 |
| 13 | I | 100 | 1 |
Playing with the answer below, I came up with:
select * from (select t.*,
max(InvoiceNo) filter (where type = 'I') over (partition by clientid order by id DESC) as imputed_invoiceno
from t) as x
where Type <> 'I';
Which gets me close:
| Id | Type | InvoiceNo | ClientId | imputed_invoiceno |
|---|---|---|---|---|
| 12 | X | 0 | 1 | 100 |
| 11 | S | 0 | 1 | 100 |
| 10 | T | 0 | 1 | 100 |
| 8 | X | 0 | 1 | 99 |
| 7 | S | 0 | 1 | 99 |
| 5 | S | 0 | 1 | 99 |
| 1 | X | 0 | 1 | 99 |
| 4 | X | 0 | 2 | 98 |
| 3 | S | 0 | 2 | 98 |
Best case result:
| Id | Type | InvoiceNo | ClientId |
|---|---|---|---|
| 12 | X | 100 | 1 |
| 11 | S | 100 | 1 |
| 10 | T | 100 | 1 |
| 8 | X | 99 | 1 |
| 7 | S | 99 | 1 |
| 5 | S | 99 | 1 |
| 1 | X | 97 | 1 |
| 4 | X | 98 | 2 |
| 3 | S | 98 | 2 |